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We study the asymptotic properties of Bayesian density estimators con- 
structed using normal mixtures of Dirichlet process priors where the random 
probability measure has been integrated out, so that the mixture is induced 
by the Polya urn scheme (Blackwell and McQueen (1973)). Thus, the density 
estimators we consider consist of at most finite number of mixture compo- 
nents for a given sample size, even though the number of mixture components 
increase with increasing sample size. This is in contrast with the existing 
works on Bayesian density estimation (for example, Ghosal and van der Vaart 

(2007) ) where the random measure is retained, while parameters arising from 
the random measure are assumed to be integrated out (in principle). Within 
our marginalized mixture framework, we consider two separate density esti- 
mators; that of Escobar and West (1995) and that introduced by Bhattacharya 

(2008) . The latter mixture model specifies a bound to the number of mix- 
ture components, preventing it from growing arbitrarily large with the sam- 
ple size. We study the posterior rates of convergence of the mean integrated 
squared error (MISE) for both kinds of mixtures and show that the MISE 
corresponding to Bhattacharya (2008) converges to zero at a much faster rate 
compared to that of Escobar and West (1995). We also show that with proper, 
but plausible, choices of the free parameters of our MISE bounds the rate 
of convergence can be made smaller than the best rate of Ghosal and van der 
Vaart (2007) given by n~ 2 ^ 5 (log n) 4/, ° and in fact, can be made smaller that 
the optimal frequentist rate n _2//5 . Apart from these we study and compare 
the MISE convergence rates of the two models in the case of the "large p 
small n" problem. Furthermore, we show that while the model of Escobar 
and West (1995) can converge to a wrong model under certain conditions, 
much stonger conditions are necessary for Bhattacharya (2008) to converge 
to a wrong model. Finally, we consider a modified version of Bhattacharya 
(2008) but demonstrate that all the results remain same under the modified 
version. 
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1. Introduction. In recent years, the use of nonparametric prior in the context 
of Bayesian density estimation arising out of mixtures has received wide atten- 
tion thanks to their flexibility and advances in computational methods. The study 
of nonparametric priors in the context of Bayesian density estimators has been 
initiated by Ferguson (1983) and Lo (1984) who derived the associated posterior 
and predictive distributions. The set-up of for nonparametric Bayesian density es- 
timation can be represented in the following hierarchical form: for i = 1, . . . , n, 

Yi ~ K{- | Oi) independently; 9i, . . . , n ~ F and F ~ n, where F is a random 
probability measure and II is some appropriate nonparametric prior distribution on 
the set of probability measures. An important choice of II is of course the Dirichlet 
process prior, for which the set-up of Lo (1984) boils down to the Escobar and 
West (1995) (henceforth EW) model. 

Though very well known, the EW model has several draw backs in terms of com- 
putational efficiency (see Mukhopadhyay, Roy and Bhattacharya (2012), Mukhopad- 
hyay, Bhattacharya and Dihidar (2011)) which manifest themselves particularly 
when applied to massive data. Bhattacharya (2008) (henceforth SB) proposed a 
new model which is shown to bypass the problems of the EW model (see Mukhopad- 
hyay, Bhattacharya and Dihidar (201 1) and Mukhopadhyay, Roy and Bhattacharya 
(2012)). The essence of the SB model lies in the assumption that data points are 
independently and identically distributed (iid) as an M-component mixture model, 
where the parameters of the mixture components, which we denote by 9i, ... ,9m, 
are samples from a Dirichlet process. Thus, the total number of distinct compo- 
nents of the mixture is bounded above by M. If M is chosen to be much less than 
n, the number of data points, then this idea entails great computational efficiency 
compared to the EW model, particularly in the case of massive data. Moreover, if 
M = n, and Yi is associated with 9i for every i, then the SB model reduces to the 
EW model, showing that the EW model is a special case of the SB model. In this 
paper, we will assume M to be increasing with n; in fact, our subsequent asymp- 
totic calculations show that M increasing at a rate slower that *Jn, is adequate. To 
reflect the dependence of M on n henceforth we shall write M n . Thus, the dimen- 
sionality of both the EW and the SB model grows with n, although in the latter 
case it grows at a slower rate. 

However, the problem of comparison between these two models with respect to 
asymptotic posterior convergence rates of the random density estimators has not 
been addressed. Although Ghosal and van der Vaart (2007) addressed the consis- 
tency issues of the EW model and obtained the posterior convergence rates, these 
are obtained under the assumption that randomness of the kernel mixture is induced 
by the random measure F while the parameters 0, are integrated out (in principle). 
Consequently, hitherto all the asymptotic calculations on random densities are done 
with respect to the random measure F. 
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Our interest, on the other hand, lies in posterior consistency and convergence 
rates of random densities under the set-up where randomness of the kernel mixture 
is induced via the parameters, assuming that the random measure F has been inte- 
grated out. Indeed, in almost all practical settings, F, which is typically assumed 
to have the Dirichlet process prior, is not of interest and is integrated out; the ker- 
nel mixture is rendered random only through the unknown parameters, which, in 
the case of Dirichlet process prior, follow the Polya urn scheme. Both EW and SB 
exploit the properties of the Polya urn distribution to construct their models and 
associated methodologies. 

Assuming the aforementioned set-up where F is integrated out, in this paper 
we investigate posterior consistency and convergence rates of EW and SB models, 
using MISE as the divergence measure between the kernel mixture and the true 
density which is assumed to generate the data. In particular, we show that the model 
of SB converges much faster than that of EW, in terms of MISE based on the 
posteriors. 

Optimization of the M ISE convergence rates helps obtain asymptotical optimal 
choices of several important prior parameters driving the models. This is important 
since in applications the prior parameters are almost always chosen by ad hoc 
means. 

We also study the rates of convergence of the two competing models in the 
"large p small n" set-up, and show how the priors must be adjusted to guarantee 
consistency in this case. As before, in this set-up as well the SB model beats the 
EW model in terms of faster rate of M ISE-based posterior convergence. 

There is also an important question regarding the conditions leading to conver- 
gence of the mixtures to the wrong models (that is, models that did not generate 
the data). We show that the model of EW can converge to a wrong model under 
relatively weak conditions, whereas much stronger conditions must be enforced to 
get the SB model to converge to the wrong model. 

Furthermore, we consider a modified version of SB's model; however, as we 
demonstrate, all the results remain intact under this version. 

The rest of the work is organized as follows. In Sections 2 and 3 we provide 
details of the explicit forms of the EW-based and the SB -based density estimators 
and provide discussions on the assumptions used in our subsequent asymptotic 
calculations. The assumptions regarding the true, data-generating distribution are 
provided in Section 4. Section 5 provides details of our Bayesian MISE-based 
divergence measure used for our asymptotic calculations. Sections 6 and 7 provide 
results showing convergence of the posterior expectations of the EW-based and 
the SB-based density estimators, respectively, to the same true distribution, also 
providing the rates of convergence. In Sections 8 and 9 we compute the a poste- 
riori Bayesian MISE-based rates of convergence of the EW and the SB model. 
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In Section 10 the posterior rates of the two models are compared, and the "large 
p small n problem" is investigated in Section 12. In Section 13, the conditions, 
under which the models may converge to wrong distributions, is investigated. The 
convergence rate of a modified version of SB model is calculated in Section 14. 
Proofs of most of the results are provided in the supplement Mukhopadhyay and 
Bhattacharya (2012b), whose sections have the prefix "S-" when referred to in this 
paper. Additionally, in Section S-9 of the supplement, we study MISE rates of 
convergence with respect to the prior- since the number of parameters in these two 
models depends upon the sample size (in the EW model the number of parameters 
is the same as the sample size, and in the SB model the number of parameters is M, 
which we will assume to increase with the sample size), it is possible, by increas- 
ing the number of parameters, to obtain prior rates of convergence. We compare 
the MISE-based prior and the posterior convergence rates in Section S-10 of the 
supplement. 

2. The EW model and the associated assumptions. We assume the follow- 
ing version of the EW model: [Yi \ 6%, a] ~ N(8i,a 2 ) (normal distribution with 

mean 9i and variance a 2 ), 9% ~ F for all i, F ~ D(aGo), where we assume 
Go(6) = N(ho,<Jq), where hq and uq are known. We let the parameter a in- 
crease with n, the sample size. Further we assume a sequence of priors on a as 
&/<J n ~ G, where a n is a sequence of constants such that < a n < K* , where 
K* is finite; a n — > 0, and G is fixed. Let a ~ G n , where G n {s) = G{s/a n ). 
This assumption regarding the prior of a is very similar to that of Ghosal and 
van der Vaart (2007). Following Ghosal and van der Vaart (2007) we also assume 
that P(a > a n ) = 0(e n ), where e„ — > 0. As we make precise later, we let the 
choice of e n depend upon the other prior parameters. We let n = {6\, . . . , 6 n )' . 
It is important to note that, for a particular value of n, we have a particular form 
of prior of a, given by G n . Thus, unlike a single sequence of random variables we 
have a double array of random variables, as illustrated below: 

Y n Y 12 ...Y lkl 
Y 2 i Y 22 ... Y 2k2 



Yfix Y r 



n2 



Ynk n 



For each n, there are k n random variables {Y n i, i = 1, . . . , k n }. It is assumed 
that k n — > 00 as n — > 00. In the array ly's are assumed to be independent among 
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each other. In the case of the EW model, k n = n, a situation in which the double 
array of random variables is referred to as the triangular array of random variables 
(see, for example, Serfling (1980)). In the EW model, for each n, Y n i ~ N(8i, a 2 ) 
and the prior specifications on 0, and a are the same as stated in the beginning of 
this section. For notational simplicity we drop the suffix "re" in {Y n i\ i = 1, . . . ,n} 
and simply denote it by {Y±, . . . , Y n }. 

We study the posterior-based asymptotic properties of prior predictive densities 
of the following form: 

a 1 n 1 ( ii — 0- 

(1) !ew{v I ©n,cr) = — : — A n H ■ — V- — — —01 — — - 

a + n a + n^-^(a + k) V a + k 

i=i 

where <p(-) is the standard normal density, and A n = f e ^ a \ k ^4> 

f=| dGo(0). 

Addition of the positive constant k to the standard deviation is essentially a device 
to prevent the variance from getting arbitrarily close to zero. It is important to note 
the difference between the model assumption for the data and the density estimator 
of our interest given by (1); even though the latter adds k to a, the former does 
not consider addition of any positive constant to a. In spite of slightly inflating 
the variance in (1) the form of the true distribution (4) to which (1) converges a 
posteriori is not severely restricted. 

3. SB model and the associated assumptions. As in case of EW model, here 
we consider the following density estimator: 

(2) f S B(y I e Mn , a) = ^ [ -Y j ^1^0 (^) , 

where M n is the maximum number of distinct components the mixture model can 
have and @m„ = #2> • • • > &M n )' '• Define Zi = j if Yi comes from the j-th 
component of the mixture model. Denote z as the realized vector of Z. We make the 
same assumptions regarding a and a as in EW. Additionally, we let M n increase 
with n. Same as in case of EW model, Y n for SB model also forms a triangular 
array. 

To perform our asymptotic calculations with respect to the SB model, we need 
to shed light on an issue associated with the frequentist estimate of a\, the variance 
of the true density /q generating the data. The assumptions on the true distribution 
fo are provided in Section 4. 

Let nj = #{t : zt = j}, n = }2j=i n j, °T,n = ~ TT = 

gfei^L , where s 2 - = g^t^Z^ , Now> defining Y = we note 

n J,n rij ' b n 



imsart-aos ver. 2012/04/10 file: mise.tex date: March 4, 2013 



6 



MUKHOPADHYAY AND BHATTACHARYA 



that i Yh=1 (Xi ~ can ^ e ex P resse d, for any allocation vector z = (z\, . . . , z n )', 
as 

1=1 3=1 t:z t =j 

-. M n 

= -E n i( ? i- F ) 2 +4n- 

Since i X)it=i (^» — Y) 2 ^ a 2 -, a.s., it would follow from the above representation 
that b\ n -> erf, a.s. if it can be shown that £ (Yj- - Y) 2 -> a.s. The 

following lemma guarantees that it is indeed the case. 

Lemma 3.1. Under the data generating true density fo, ^ Ylj'=i n j (Xj — Y) 2 — )• 
0, a.s. if 1 < M n -< O(n) (/or a«v fwo sequences and we say an -< affl 
if%^0). 

Proof. See Section S-l.l of the supplement. 

□ 

From lemma 3.1 and the fact that - Yli=i (^t — Y^ 2 — > a\ a.s. we can con- 
clude dj, n — > a\, a.s. So, as n — > oo, nb\ n ~ no 2 -, a.s., (for any two se- 
quences {anH and {a$\, a?' ~ denotes lim n _ 5 . 00 %y = 1), implying that 

as n — > oo, Y2j=i ^2frzt=jXt ~ Yj) 2 becomes independent of z. We begin by 
writing na\ n ~ C n , where < < H (for some sufficiently large constant N) 
is a bounded sequence independent of z and has the same limiting behaviour as 
&Tn- We will perform our calculations when for each n, |Yj| < a;i = 1, . . . , n, 
for some sufficiently large constant a > 0. In this case < a^ n < 4a 2 . Thus, we 
may choose N = 4a 2 . 

To prove many of our results on the SB model, we will assume suitable condi- 
tions on C n . The conditions on C n that we assume are reasonable, and are consis- 
tent with the above discussion. In keeping with the above discussion, the proof of 
Lemma 7.4 requires us to assume that for large n, C n /n is greater than or equal 

to the order of a 2 log (^J > unless e n tends to zero at too fast a rate compared to 

a 2 . In the latter situation a is much harder to estimate since in that case the sample 
size n will be smaller, a n larger, and a can be anything between and a n , which is 
to be estimated with a relatively small sample. Hence, although in the former case 
we expect C n /n get close to zero for large n, in the latter case C n /n may remain 
bounded away from zero. 
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4. Assumptions regarding the true distribution. In this paper we assume 
that the true, data generating distribution is of the following form: 

(3) My) = f +C 1<P dF (6), 

where k is some known positive constant, and Fq is an unknown distribution com- 
pactly supported on [—a — c,a + c], for some constants a > and c > 0. 

Note that, using the mean value theorem for integrals, also known as the general 
mean value theorem (GMVT) we can re- write fo{y) as 

(4) y b ( y) = _^^__j I 

where 6*(y) G (—a — c, a + c) may depend upon y. 

The results in the following two sections show that the posterior expectations 
of the Bayesian density estimators corresponding to the EW and the SB models, 
given by (1) and (2) converge to the form given by (4). 

5. Bayesian MISE based posterior convergence. Assuming that f n is an es- 
timate of the true density /q based on the observed data Y n = (Y±, . . . , Y n )' , the 
MISE of f n is given by 

(5) MISE = [ E(f n (y) - MvJfdy, 

where the expectation is with respect to the data Y n . In our Bayesian context, we 
assume that the density estimate of fo(y) is f(y\Q), where is an unknown set of 
parameters. We consider the following analogue of the above classical definition: 

MIS Ei = [ E(f{y\Q)-My)fdy 
Jy 

(6) = / / {/(yl ) - io(y)} 2 [e | Y n ]ded y , 

Jy Je 

that is, the expectation is with respect to the posterior of G, generically denoted by 
[0 1 Y n ] . We further modify definition (6) by considering a weighted version, given 
by 

MISE 2 = [ E(f(y\Q) - Mv)? Mv)dy 
Jy 

(V) = / [ {f(y\Q)-My)} 2 [Q\Yn]My)dedy, 

Jy Je 
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where fo(y) is the weight associated with {f(y\Q) — fo(y)} 2 . Thus, fo{y) down- 
weights those squared error terms {/(y|0) — fo(y)} 2 which correspond to extreme 
values of y. Such weighting strategies that use the true distribution as weight, are 
not uncommon in the statistical literature. The well-known Cramer- von Mises test 
statistic (see, for example, Serfling (1980)) is a case in point. For notational sim- 
plicty we refer to M ISE2 simply as M ISE. 

MISE of the form (7) can be expressed conveniently as 

MISE = [ Var (f (y | 9) \Y n ) f (y)dy 
Jy 

(8) + J {Bia8(j(y I Q)\Y n )Y f (y)dy, 



where Var (y | 6) \ Y n ) denotes the variance of f (y | 0) with respect to the 
posterior [6 | Y n ] and 



(9) Bias(f(y \ Q)\Y n ) 

e (f(y I ©) 



E[f(y\ 6) 



Mv) 



Y n j denoting the expectation of f(y \ 6) with respect to [0 | Y n ]. 

For the EW and the SB models we will denote the respective MISE's as 
MISE(EW) and MISE(SB), respectively. Let S n = {Y n : maxi<i< n \Yi\ < 
a}, and let Ig n denote the indicator function of the set S n . Also, let Efi denote 
the expectation with respect to /o, the true distribution of Y n . We will com- 
pute and compare the rates of convergence to of Eq [MISE(EW)Is n ] and 
Eq [MISE{SB)Is n \ when the true density /o is estimated using the EW model 
and the SB model, but with the same set of data for any given sample size. In 
particular, we show that E% [MISE(SB)l Sn ] /E% [MISE(EW)I Sn }-> 0. 

Before proceeding to the MISE calculations, we first investigate the asymptotic 
forms of the posterior expectations of the EW-based and the SB -based density es- 
timators given by (1) and (2), respectively. This we do in the next two sections. 
These results, apart from being interesting in their own rights and showing explic- 
itly the form of the true distribution (the asymptotic form of posterior expected 
density estimators), actually provide the orders of the bias terms of the correspond- 
ing MISE calculations, recalling from (8) that MISE can be broken up into a 
variance part and a bias part. 

Henceforth we will denote cLGq{x) by Hq. In proving most of our results 

the GMVT will play a very important role. 

6. Convergence of the posterior expectation of the EW density estimator to 
the true distribution. 
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THEOREM 6.1. Under the assumptions stated in Sections 2, 4, and 5, for any 



(10) 

where 

(11) 
and 

(12) 



sup 

-oo<y<oo 

= o 



E fEw{y I @n,cr) 



My) 



-— + ^—(B n + e* n + a n ) 
a + n a + n 



Br, 



a + n 



-e n 



a 



2b n ' 2 



(a + n) n 1 



1 



/« (72) a sequence of positive numbers such that < b n < cr n for all n, 

a = 0(n w ), 0<u<l, fi*{y) E (- Cl ,ci) for each y, and f Q (y) = \<\> (2=£M 



is a well-defined density. The constant involved in the order (10) is independent of 



PROOF. See Section S-2.3 of the supplement. The proof depends upon several 
lemmas, stated below. □ 



Remark 1: We will choose e n such that e* — > 0; in other words, we choose e n 

"( a + e i) 2 , , n n 1 
2 — (g+w) n 



such that 



1— En 



g 2b n 



Remark 2: Note that, since the right hand side of (10) does not depend upon 
Y n £ S n , it follows that 



El: 



sup 




— oo<y<oo 





E fEw{y I ®n,0"; 



My) 



h 



O [ — — + — (fl n + e* + cy, 
a + n a + n 



Remark 3: The proof of Theorem 6.1 shows that for each y, /i*(y) corresponds 
to <7 = (the limit of the sequence a n ), and so is non-random. 
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The terms e* and B n arise as the orders of the posterior probabilities P(a > 
o~n\Y n ) and P (0, G [—a — c, a + c] c , a < a n \Y n ), respectively. The first term in 
the order (19) of Theorem 8.1 is contributed by the order of the term -^—A n , 
where A n is already defined in connection with (1). These results, used for proving 
Theorem 6.1, which also play important roles in proving our main Theorem 8.1 on 
MISE related to the EW model, our main theorem on MISE in this section, are 
made precise in the following lemmas. 

LEMMA 6.2. Let {b n } and {cr n } be sequences of positive numbers such that 
a n — > 0, < b n < a n for all n, and P(a > a n ) = 0(e n ), for some sequence of 
positive constants e n >0.If\Yi\ < a; i = 1, . . . , n, then 

(13) P(a>a n \Y n ) = 0{el) 

PROOF. See Section S-2.1 of the supplement. □ 

Lemma 6.3. Under the same assumptions as Lemma 6.2 and < a n < a, the 
following holds: 

(14) P {6, £ [- a -c,a + c] c , a < a n \Y n ) < -%-B n , 

where Co = sup^ja -1 exp(— c 2 /Aa 2 )}, and 5 is the lower bound of the density of 
Go on [—a — c, a + c]. 

PROOF. This proof is similar to that of Lemma 1 1 of Ghosal and van der Vaart 
(2007). □ 

Lemma 6.4. ^A n = o(^),fora = 0(n"),0<oj<l. 



PROOF. See Section S-2.2 of the supplement. 



□ 



7. Convergence of the posterior expectation of the SB density estimator to 
the true distribution. 



THEOREM 7.1. Under the assumptions stated in Sections 3, 4, and 5, the fol- 
lowing holds for any Y n £ S n : 



sup 

-oo<y<oo 



(15) 



E ( fsB{y I ©M n ,cr) 
1 



O M n B Mn + 1 



Mr, 



A; V k 
a + M n 



a 
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wheree * - E„ / «(a+ Cl ) 2 \ (a+M n ) M n _ (g+M„) , 

is as defined in Remark 2 associated with Theorem 6.1. Also, for every y, 9*(y) £ 
(—a — c,a + c), and Eq denotes the expectation with respect to the true distibution 

ofY n , given by fo(y) = ( y ~ e , — ). The constant involved in the above order 



■2 



k^ \ k 
is independent ofY n . 

PROOF. See Section S-3.4 of the supplement. The proof depends upon new 
ideas related to breaking up of relevant integrals and several lemmas, which are 
discussed and stated below. □ 

The remarks made in connection with Theorem 6.1 associated with the EW 
model are also applicable to Theorem 7.1 in connection with the SB model. In 
particular, we will choose e n , a, M n such that the right hand side of (15) goes to 
zero. 

The proof of Theorem 7.1 requires us to introduce some necessary concepts and 
technicalities. These new ideas are needed for the SB model and not for the EW 
model since the latter is a much less complex model than the former. In particular, 
note that unlike the EW case where each 9i is represented in L(@m„ , Y n , z), 8{ in 
the SB model may or may not be allocated to Yi for some i, that is, there can exist 
z such that z\ ^ i, I = 1, . . . , n. Suppose that R* = {z : no z\ = i}, R2 = {z : 
at least one z x = i}. Note that #R\ = (M n - l) n and #R* 2 = M™ — (M n - l) n . 

If z G R^, let 2 denote the set of 0j's present in the likelihood and let #O z =j, 
where j = 1, . . . , (M n — 1). By the definition of R\, Oi is not present in the 
likelihood. Without loss of generality let us assume that 9\ , . . . , 6j are represented 
in the likelihood L(@M n , z, Y n ). For obtaining bounds of L(0jvf„, z, Y n ) it is 
enough to consider only 2 . For z G R\, we split the range of integration in the 
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numerator in the following way: 



I L(e Mn ,z,Y n )dG n (a)dH(Q Mn ) 

[ [ L(e Mn ,z,Y ri )dG n (a)dH(e Mn ) 

J J6»ie[-a-c,a+c] c 

+ / / L(e Mn ,z,Y n )dG n (a)dH(Q Mn 



[ [ L(e Mn ,z,Y n )dG n (a)dH(G Mn ) 

J J6»ie[-a-c,a+c] c 

+ / / / L(e Mn ,z,Y n )dG n (a)dH(Q 

+ [ [ [ L(e Mn ,z,Y n )dG n (a)dH(Q, 

/ L(Q Mn ,z,Y n )dG n (a)dH(e Mn ) 

1=1 J m 

+ f L(@ Mn ,z,Y n )dG n (a)dH(e Mn ), 



(16) 



where W\={Q\ G [—a — c, a + c] c , rest fy's are in (— oo, oo)}, Wi={9i G [—a — 
c, a+c], . . . , G [— a— c, a+c], 0/ G [— a — c, a+c] c , rest #/'s are in (— oo, oo)} 
for I = 2, . . . , j, Wjc={9\ G [—a — c, a + c], . . . , G [—a — c,a + c],0j G 
[—a — c, a + c], rest #;'s are in (— oo, oo)}. 
Also define Vj and E as the following: 

Vj = {z G R\ : exactly j many 0j 's are in L(6m„, 2, 2/)}, and 

.E = {all #/'s present in the likelihood are in [—a — c, a + c] and the rest 0j's are 
in (—00, 00)}. 

With the above developments we now state the following results which con- 
tribute to the proof of Theorem 7.1. 

LEMMA 7.2. Under the same assumptions as in Lemma 6.2 and Lemma 6.3, 
P(a > a n \Y n ) = 0(e* Mn ), where 

'n(a+ Cl f\ (a+M n ) M n 



S M n ~ l-e n eX P ^ 2(6 n )2 ) a M nH M n • 
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PROOF. See Section S-3.1 of the supplement. □ 
LEMMA 7.3. Under the same assumptions of lemma 6.3, P(Z £ R[, &m„ £ 
E c ,a < a n \Y n ) = 0((M n - 1)B M J, where B Mn 



(a+M n ) 



. c 2 

4CT 2 , 



PROOF. See Section S-3.2 of the supplement. 
Lemma 7.4. Let 



(17) 



C n > 



11 




+ O(ilog 








(4) 





where ">" stands for ">" as n — )• oo. 
(18) 

P(Z G ^,e A / n £ E,a < a n \Y n ) = O ((l 
PROOF. See Section S-3.3 of the supplement. 



Mr, 



a + Mr, 



a 



AI n 



□ 



□ 



Note that if M n < yri, then it is easy to verify, using LHospital's rule, that 
(l-jlA ( q± ^ L ) Mn ->■ 0. Lemma 7.4 thus formalizes the fact that, if the max- 
imum number of components is small compared to the data size, then, given an 
appropriate estimator of a, the probability that any mixture component will remain 
empty tends to zero as data size increases. On the other hand, as we show later in 
Lemma 13.2, if M n > n, the probability that a mixture component will remain 
empty may converge to 1 as n — > oo. 



Remark: As shown in Section 3, C n approaches the true population variance as 
n — y oo. The condition of lemma 7.4 says that C n is asymptotically larger than a 
large number. But the prior assumption on a, the common variance of individual 
mixture components, tells that as n — > oo, the distribution of a becomes degener- 
ate at 0. These two conditions on C n and a may appear contradictory. The answer 
to this apparant contradiction is that the concept of a, the common variance of indi- 
vidual mixture components, is nothing to do with population variance. C n becomes 
close to population variance in the long run, not to a. Thus one should not confuse 
the concept of C n with the concept of a. 
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LEMMA 7.5. P{Z G {R\) c ,9i € [-a-c,a + c] c ,a < a n \Y n ) =0{B Mn ), 
where Bm„ is defined in Lemma 7.3. 

PROOF. When z G {Ri) c , then 9{ is present in the likelihood and hence in Q z 
(defined in Lemma 7.4). Thus the same calculations associated with Lemma 7.3, 
now only with 9%, guarantees the result. □ 

The following result asserts that the density estimators of both EW and SB con- 
verge to the same model, so that fi*(y) = 0*(y) for every y. 

THEOREM 7.6. The models ofEW and SB converge to the same distribution. 
In other words, for every y, n*(y) = 9*(y), where n*(y) is given in Theorem 6.1 
and 0* (y) is given in Theorem 7. 1. 

PROOF. See Section S-3.5 of the supplement. □ 

We strengthen our convergence results given by Theorems 6.1 and 7.1 by ob- 
taining the orders of the MISE of the density estimators given by (1) and (2). In 
fact, Theorems 6. 1 and 7. 1 are directly related to the bias of the MISE which can 
be broken up into a variance part and a bias part. 

8. MISE bounds for the EW model. 

8.1. The main result for the EW model. The main result of this section is given 
by the following theorem. 

THEOREM 8.1. Under the assumptions stated in Sections 2, 4, and 5, 
(19) E$[MISE(EW)I s J = 0\J^^j +B n + e* n + alj, 

To prove Theorem 8.1 we will break up MISE into variance and bias parts, 
following the representation (8) of MISE, and will obtain bounds for the variance 
and the bias parts separately, when Y n € S n . These bounds will be independent 
of both y and Y n . 

Note that 

n 



Var(f EW (y \ Q n ,a)\Y n ) 



^ \(a + ky\a + k 



(a + n) 2 



Yn 



(20) 
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8.2. Order of Bias(fEw{y \ ®n,&))- It follows from Theorem 6.1 that 

1 ,(y-v*{y)' 



(21) 



E ( fswiy I Qn,cr 

o 



k 



k 



+ —{B n + e* n + a ri 



a + n a + n 

For rest of this paper we denote ^- + Jh(B n + e* + a n ) by S* 



8.3. Order of Var [ ^< 



Lemma 8.2. 
(22) Var 



1 



a+k 



y 



(a + k) \a + k 
PROOF. See Section S-4.1 of the supplement 



0(B n + e*). 



8.4. Order of the covariance term. For i ^ j, let 



£,jn 



y—Oj 

(a+k)^ I a+k 



(a+k) 1 



y-»j 

a+k 



E 



E 



(a+k)' 



(a+k) 



a+k 



a+k 



and 



□ 



cov 



'J 



Cov 



E 



y-Or 



(a + k) \a + k ) ' (a + k) \a + k 



£,jn E I qj n 



Y r . 



Lemma 8.3. 

(23) covij = 0(B n + e* n ) 

PROOF. See Section S-4.2 of the supplement. 



□ 



8.5. Final calculations putting together the above results. For Y n 6 S n , we 
thus have, 



MISE(EW) 

o- 1 



(a + n) 2 



[n(B n + e* n ) + n(n - l)(B n + e* )] + (S*) 



(24) 
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Assuming n to be large enough such that ^" n ^ « 1, the actual form of MISE 
given in equation (24) can be simplified further for comparison purpose. Note that 
if — t > 1, then the conditional distribution based on the Polya urn scheme im- 

a+n ' J 

plies that Ofs arise from Go only, which seems to be too restrictive an assumption. 
Thus assuming — > seems more plausible as it entails a nonparametric set 
up. We assume a = n w , oj < 1 and assume that, rj 1 for large n. So, (24) 
boils down to 

MISE(EW) 

2\ 



? r + O + ( B » + O + ( + s„ + e ; + a n 

\ (a + n) z \a + n 



(25) 



for Y n 6 S n . We can further simplify this form by retaining only the higher 
order terms. Note that we have assumed that under certain conditions B n , e* and 
^ converge to 0, and hence j^^i{B n + e* n ) < O {B n + e* ). In the third term 
of equation (25) there are two extra terms, a n and under the squared term. 
Adjusting for that term we write the simplified form of of MISE, for Y n G S n , 
as 



o 



(26) MISE(EW) = O + B n + e* + a, 



a + n 



2 



Note that the order remains unchanged after multiplication with Ig n and taking 
expectation with respect to Eq . In other words, Theorem 8.1 follows. 

9. MISE bounds for the SB model. 

9.1. 77ze main result for the SB model. 

THEOREM 9. 1 . Under the above assumptions stated in Sections 3, 4, and 5, 
E% [MISE(SB)I Sn ] =oUl-j i y (^^) M + MB M + e* M + . 

9.2. Bounds of Var(f S B(y I @M, OO- 
LEMMA 9.2. 

i M / 1 / 
1 ST^v I 1 ±(V 



5'-feW(53)N 

(i( 



(27) = 0|T7( M5 M+(1-^) l-^-] +4/ 



1 \ n fa + M 



M 
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PROOF. See Section S-5. 1 of the supplement. 
9.3. Order of the covariance term. 



□ 



Lemma 9.3. 

M 



E E coo 

i=l j=l,j^i 



(a + k) \a + k J ' (a + k) \<t + k 



(28) 



O MB M + 1 



M 



a + M 



a 



M 



PROOF. See Section 5.2 of the supplement. 



□ 



9.4. Bound for the bias term. The bias of the MISE, which we denote by 

Bias(f S B{y | 6m, o-)), is given by 



1 

17 



M 



1 



[a + k) \a + k 



y 



From Theorem 7.1 we have, 
(29) 

Bias(f SB {y | @M,cr)) 2 = O 



MB M + 1 



k V k 



1 \ n fa + M 



M 



M 



Thus, the complete order of MISE can be obtained by adding up these individ- 
ual orders of (27), (28) and (29), yielding 



MISE(SB) = O — MB M + 1 



1 

M 



a + M 



a 



M 



M% + 1 



M 



a + M 



o 



+ e M + 



(30) 
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Appropriate choices of the sequences involved in MISE(SB) guarantee that 
MB M -»• 0, (1 - i) n (^) M -> and -> 0. With these we have 

MISE(SB) = O^l-^J^^Y + MBM + elj + a^. 
(31) 

Note that the order remains unchanged after multiplication with Ig n and taking 
expectation with respect to Eq , thus proving Theorem 9.1. 

10. Comparison between MISE's of EW and SB. As claimed in Mukhopad- 
hyay, Bhattacharya and Dihidar (201 1) and Mukhopadhyay, Roy and Bhattacharya 
(2012), the SB model is much more efficient than the EW model in terms of com- 
putational complexity and ability to approximate the true underlying clustering or 
regression. Here we investigate the conditions under which the model of SB beats 
that of EW in terms of MISE. In particular, we provide conditions which guaran- 
tee that each term of the order of MISE(EW) dominates the corresponding term 
of the order of MISE(SB) (for any two sequences {an } and {a^} we say that 
dominates if /an — > as n — > oo). 

For the purpose of comparison we will use the same values of b n , a n , e n , for all 
n, for both SB and EW model, in a way such that both the MISE's converge to 0. 

LEMMA 10.1. Let a = n u) , M = n b , where oj < 1, b < 1, and u < b. Then, 
% -> 0. 

PROOF. The proof follows from simple applications of L'Hospital's rule. □ 

Lemma 10.2. Let M = n b and a = n u , where b > oj. Then B n > MB M if 
M < y/n. 

PROOF. The proof follows from simple applications of L'Hospital's rule. □ 

Lemma 10.3. n(n) = m) 1 2 " ; >Q,if\>b>u. 

( —) 

\a+n J 

PROOF. The proof follows from simple applications of L'Hospital's rule. □ 

Hence, combining the results of Lemma 10.1 to Lemma 10.3 we conclude that 
MISE(SB) converges to at a faster rate than MISE(EW), provided that we 
choose M and a as required by Lemmas 10.1-10.3. 
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11. Comparison with the optimal frequentist MISE rate. Under suitable 
regularity conditions, the optimal M ISE rate of convergence of a kernel density 
estimator is n~ 2 / 5 . Ghosal and van der Vaart (2007) consider densities of the form 

(32) f F (x) = J <j> f^ 9 ) dF(9), 

where F ~ D(q,Gq). Thus, in contrast to our approach where we considered pre- 
dictive densities integrating out the random measure F, Ghosal and van der Vaart 
(2007) deal with densities of the form (32) whose randomness is induced by F. 
The remaining prior structure is very similar to ours. Assuming that /o is the true 
density generating the data, Ghosal and van der Vaart (2007) consider convergence 
(to zero) under the true distribution of the data, of posterior probabilities of the 
form P {F : d/j,(/p, /o) > Ceo, n | Y n ), where eo. n is the rate of convergence that 
Ghosal and van der Vaart (2007) are interested in, C is some constant and dh is 
some pseudo-metric; in particular, Ghosal and van der Vaart (2007) consider the 
Hellinger distance between two densities. Under certain assumptions the best rate 
of convergence turns to be re -2 / 5 (log n) 4 / 5 , which is slower than the frequentist 
MISE rate. 

Here we show that in our approach, by choosing a, e n and a n appropriately, 
MISE(SB) can achieve a smaller rate than n~ 2 / 5 . Moreover, with our approach, 
MISE(EW) can achieve the optimal frequentist rate n -2 / 5 . 



r 2 = 4 

i, r > 0. Then the order of MISE(EW) becomes 



Let a = < u) < 1. We set a„ = A, t > and e„. is chosen so that 



1 [a + n c 2 n*\ 1 1 

(l + n i-uy \ a ) n r re* 

If we choose r < |, t < | and u < |, then (33) will be less than n~ 2 / 5 . Moreover, 
under the same conditions MISE(SB) has a smaller rate than MISE(EW). The 
form of our true distribution satisfies the assumption of Ghosal and van der Vaart 
(2007) which requires it to tend to zero smoothly at the boundary points of its 
support; it also satisfies those required for frequentist density estimation. Hence, for 
our form of the true distribution, the convergence rates of the different methods are 
comparable-we obtain the following ordering: MISE(SB) -< MISE(EW) -< 
FMISE -< BR GVV , where FMISE denotes the optimal frequentist MISE 
rate and BRgvv denotes the best rate of convergence of Ghosal and van der Vaart 
(2007). 

We now show that for a n = re^ 1 / 5 (which can be thought of as analogous to the 
optimum bandwidth in the frequentist density estimation paradigm) MISE(EW) 
can achieve the rate re~ 2 / 5 when minimized with respect to a over < a < 
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oo, that is, without assuming any particular form of a. Taking u\ = n t , the 
order of MISE(EW) is minimized for a = n 



1 



MISE(EW) in that case becomes 



The order of 



! c 2 n f \ 2 i /o _2 1 1 



(34) hl?^ +2^ + ^ + 



Hence, if cr^ = , then the order of MISE(EW) is also ^75, if r > t. 

However, note that the optimum order of MISE(EW) given by (34) is at- 
tained when a is a decreasing function of n. This is not the same condition under 
which we have proved better performance of the SB model over the EW model. 
So it is of interest to study whether under this new condition also the SB model 
outperforms the EW model. Using L'HospitaFs rule we can show, putting n — 

// — — 1 I in Lemmas 10.1-10.3, that the corresponding ratios still 



converge to as n — > 00. Thus the SB model can achieve even smaller optimal 
posterior MISE rate compared to the other methods. 

12. The "large p small n" problem. So far we have discussed the asymp- 
totic performance of SB and EW models in terms of MISE for univariate data. 
We now investigate the M/S'-E-based convergence properties when the sample size 
n is much smaller than p, the dimension of each data point. In rough terms, the in- 
formation contained in the data is much less compared to the number of parameters 
to be estimated, which makes inference extremely challenging. This problem is the 
well-known "large p small n" problem. We assume that asn-> 00, p — > 00 such 
that p/n — > 00. In fact, the data dimension should more appropriately be denoted 
by p n , but for the sake of convenience we suppress the suffix. 

12.1. EW model. The EW model for p variate data is defined as follows. 

(35) Yi ~ N p (9i, S) independently, 

where Yi = (Yn, . .., Y ip )', 9i = (On, .. ., 6 ip )' , i = 1, . . . , n and S = a 2 I pxp , 
where I pxp is the p x p identity matrix. 

(36) 0i~ d F,i = l,...,n;F~ D(aG ), 

where under Go, 9% N p (po, So), where So = ctqI pxp ; po and do are assumed 
to be known. The prior on a remains the same as before. The predictive density at 
the point y = {y x , . . . , y p } is given by: 
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n p 



1 1 

(37) /w(y I e n , a) = -^-A n + — — V n r— , , 

a + ra a + n - LJ - (a + k) \ a + k 

i=i i=i v x 

Let S n = {Y n : |Yy| < a;i = l,...,n,j = 1, . . . ,p}. The form of the 
MISE and the approach to bound the MISE will be same as in Section 2. So, 
we proceed directly to obtain the bounds of the different parts of the MISE. 



LEMMA 12. 1 . Under the same assumptions as in Lemma 6.2, 
P(a> a n | Y n ) = 0(e L n ), where e L n = ^ exp (^g^) <§±^. 

PROOF. See Section S-6.1 of the supplement. □ 

Let E={9n G [—a — c,a + c], V/, rest Oi's are in W} and E c ={at least one 
Oil S [—a — c, a + c] c , I = l(l)p, rest 0/'s are in W}; 3? representing the real line. 
We then have the following result. 

LEMMA 12.2. Under the conditions of Lemma 6.3 it can be shown that 
P (6 n eE c ,a<a n \ Y n ) = 0{pB n ) where B n = 

PROOF. See Section S-6.3 of the supplement. The proof proceeds by splitting 
the integral J" J* °™ L(@ n , Y n )dH(Q n )dG n (a) to sum of integrals over particu- 
lar regions provided in Section S-6.2 of the supplement. □ 

Hence, from Lemmas 12.1 and 12.2, and from the same argument used for 
obtaining the order of MISE(EW) in Section 2, here we obtain the order of 
MISE(EW) in the large p small n case as 



(38) E$ [MISE(EW)I Sn ] =0[[ ) + P B n + e^ + a. 



12.2. SB model. Here the model is 

M V 



(39) 



11 

/SB (y\ & M,C7) = — Y,U7^ 



Vi - On 

M ^ 1 1 (a + k) T V cr + k 

■ l l=i v ; v 
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where Q M = (01, • • • , M )', Oi = {Oil, • • • , dip)', £ = cr 2 / p x P and the prior as- 
sumptions are the same as in equations (35) and (36). 

We bound the individual probabilities in the same way as in Section 9. 



Lemma 12.3. P (a > a n \ Y n ) = O (e^), where 

r L _ e n ( npja+d) 2 } (a+M)** 

e M ~ l-e n eX P ^ 2(6 n ) 2 J {a) M H Mv 

PROOF. Here the form of the likelihood is 
(40) L 9 M , z, Y n ) = —e- ^ ^«=i ~ 

(7 ^ 



Application of same technique as in Lemma 7.2 leads to the required result. 



Let E ={all 0i's in likelihood are in [—a — c, a + c], rest are in 5t p }. 



□ 



Lemma 12.4. P(Z G i^,G M e E c ,a < a n ) = 0({M - l)pB M ), B M 
same as in Lemma 7.3. 

PROOF. In the same way as in Lemma 7.3 it follows that 

P(Z G Rt, & M £E c ,a< <r„) < (M - l)pB M . 

Splitting the integral J^ n J e L(@m, z, Y n ) dH(Q n )dG n (a) required for this 
proof is provided in Section S-7. □ 

LEMMA 12.5. P(Z G Rl,Q M eE,a<a n ) = ((l - i) n 

PROOF. Exactly same as the proof of Lemma 7.4. □ 

Lemma 12.6. P (Z G (R\) c , Om £ E c , a < a n ) = O (pB M ). 

PROOF. The proof follows in the same way as that of Lemma 7.3. □ 

Proceeding in the same way as in Section 9 we can show that the order of M ISE 
of SB model for the "large p small n" case is 



M 



E%[MISE(SB)l Sn ] = 0\^-^J (^^) +MpB M + e L M + a 2 ri 
(41) 
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Note that the two "large p, small n" MISE's given by (38) and (41) converge 
to zero if a n and e n converge to zero at a much faster rate than the rate needed for 
MISE's given by (26) and (31). This is clear because p grows much faster than 
n and so, the terms pB n and MpBu involved in the MISE's do not go to zero 
unless a n is made to converge to zero much faster than in the usual cases. Also, 
e n must go to zero much faster than in the usual situations to ensure that and 
converge to zero. In other words, rather strong prior information is necessary 
to compensate for the deficiency of information in the data. 

The above arguments show that even in the "large p small n" problem both EW 
and SB model can be consistent in terms of MISE by choosing the prior on a 
appropriately. However, it is easy to see that even in this set-up, MISE(SB) -< 
MISE(EW). 

13. Convergence to wrong model. We now investigate conditions under which 
the models of EW and SB converge to wrong models, that is, to models which did 
not generate the data. It is perhaps easy to anticipate that if a is made to grow at a 
rate faster than n, then the density estimator of EW would converge to the convo- 
lution of the kernel and Go, irrespective of the true, data-generating model. If Go 
has non-compact support then the convolution can not be represented in the form 

. _(y-n 2 

±e , which we have assumed to be the form of the true distribution from 

which the data are obtained. We show in this section that indeed the simple condi- 
tion of letting a grow faster than n is enough to derail the EW model. On the other 
hand, much stronger conditions are necessary to get the SB model to converge to 
the wrong model. 

13.1. EW model. 

THEOREM 13.1. Suppose that a y 0(n). Then 
(42) E% [E [fEwiy | 9 n ,a)) I Sn \ -> / ^^dG (9 n+1 ). 

Proof. See Section S-8.1 of the supplement. □ 

Thus, the condition a >- 0(n) is enough to mislead the EW model, taking it to 
a wrong model. 

13.2. SB model. Recall that no condition on a is necessary to ensure asymp- 
totic convergence of MISE(SB) given by (31); only a n needs to be chosen ap- 
propriately and that M < yjn. So, unlike in the case of EW, even if a >- O(M) 
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or a >~ 0(n), the SB model can still converge to the true distribution by setting 

M -< 0(s/n). 

However, for M y 0(n), the term (l — j-[) n ( a+ a M ) does not converge to 0. 
This term is associated with the upper bound of the posterior probability P(Z £ 
R*, 6m G E,a < a n \Y n ). The next question arises whether there are conditions 
under which this probabilty can converge to 1 as n — > oo. Towards answering this 
question, let us consider the following result. 

LEMMA 13.2. Let {r n } be a sequence tending to zero such that O {\og \Jj~\ J 
— nlog(r n ) — nlog(n), and let C n = O s > 2. Then 

(43) P(ZeRl,e M EE,a<a n \Y n )> (l - JL 

PROOF. See Section S-8.2 of the supplement. □ 
Remark 1: Note that C n = O [ ) ; s > 2, as required by Lemma 13.2 need 

\ r n n ) 

not necessarily ensure that ^ remains bounded as n — > oo. In such cases we can 
not assume that \ Yi\ < a;i = 1, . . . , n, since this forces — to be bounded. Thus, 
we must assume in such cases that — oo < Yi < oo; i = 1, . . . , n. This has no 
effect on the proofs of our results in this section since none of the proofs of these 
results depends upon the range of the YiS. 



a 



M 



a 



1V1 



Remark 2: If Yj are not bounded, for M >~ 0(n), it is expected that ^ will grow 
with n. This is because the mixture components from which the data arise may be 
very widely separated when the number of mixture components far exceeds the 
number of data points. 

Remark 3 : Remarks 1 and 2 above shows that for the probability (43) to be 
large, it may require the data to be come from unbounded regions, and to have 
very high variability. Since Lemma 13.2 and the probability given by (43) are in- 
strumental for the convergence of the SB model to the wrong model (to be seen 
subsequently), it follows that in this respect as well SB model seems to be superior 
to the EW model, since the latter does not require similar assumptions on unbound- 
edness. 



Using L Hospital's Rule it is easy to check that for M = n b , a = n w , uj > 0, 

b > 0, uj, b > 1, and uj - b > b, (^pjy) (l - jj) n ->■ 1- That is, if M > n, 
the probability that a mixture component will remain empty may converge to 1 as 
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n 



oo. 



Thus for M > n, a > n, the posterior probability of {Z G R*, @m € E, a < 



V-Oi 



Y n , the factor 



a n } converges to 1, as n — > oo. Hence, in E ■ y \ ^i+k 

associated with the above posterior probability is the only contributing term for 
large n. We now study convergence of this term. 



THEOREM 13.3. Assume the conditions of Lemma 13.2. Further assume that 
M = n b , a = n w , oj > 1, b > 1 and uj — b > b. Then, for the SB model it holds 
that 



(44) E% 



EU SB (y\@ M ,(T) 



1 (y-^ 2 , K 



k 



PROOF. See Section S-8.3 of the supplement. 



□ 



Thus, while the EW model can converge to the wrong model if only a y- 0(n) 
is assumed, much stronger restrictions are necessary to get the SB model to deviate 
from the true model. 



14. Modified SB Model. A slightly modified version of SB model is as fol- 
lows: 



(45) 



M 1 

fhiv I ©M,n,o-) = J2^ij— 



i=l 



(o + k) \a + k 



where J^fii n = 1- We assume that II = (tti, . . . , ttm) ~ Dirichlet{(5\, . . . , (3m), 
Pi > and is independent of G m and a. The assumptions of Dirichlet process prior 
on Qm and the prior structure of a remain same as before. The previous form of 
the SB model (2) is a special case (discrete version) of this model with = jj for 
each i. 

Due to discreteness of the Dirichlet process prior, the parameters 6i are coinci- 
dent with positive probability. As a result, (45) reduces to the form 



(46) 



fh (y I ©m, n, a) = Pi T^^y <t> ( - 

i=l 



y-Jl 

+ k 



where {9\, . . . , 9 M *} are M* distinct components in ®m with 9* occuring Mi 



times, and pi = Y^f=i n j- m contrast to the previous form of the SB model (2) 



imsart-aos ver. 2012/04/10 file: mise.tex date: March 4, 2013 



26 



MUKHOPADHYAY AND BHATTACHARYA 



where the mixing probabilities are of the form Mi/M, here the mixing probabilities 
Pi are continuous. 

The asymptotic calculations associated with the modified SB model are almost 
the same as in the case of the SB model in Section 9. Indeed, this modified version 
of SB's model converges to the same distribution where the EW model and the 
previous version of the SB model also converge. Moreover, the order of M ISE 
for this model remains exactly the same as that of the previous version of the SB 
model. In Section S-9 of the supplement we provide a brief overview of the steps 
involved in the asymptotic calculations. 

SUPPLEMENTARY MATERIAL 

Throughout, we refer to our main paper Mukhopadhyay and Bhattacharya (2012a) 
as MB. 



S-l. Proofs of results associated with Section 3 of MB. 



S- 1.1. Proof of Lemma 3.1. 

Proof. Let C n = — j-, . We recall that Y n form a triangular 

array, as argued in Section 2 of MB. The n-th row of that array is summarized by 
the statistic C n . Since the random variables of a particular row of that array are 
independent of the random variables of the other rows, C' n are independent among 
themselves. 



Suppose ht is the true population mean and a\ is the true population variance 
(both of which are assumed to be finite). Since all Yj's are from same true density 
fo, under /„, (Y t ) = fi T and V* a (Yi) = a\. 



(1) 



E K 



Y\z 



--l n 3 
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Note that 



. Eft** 



(2) 
not 

o\. Hence, 



n 

M n - l 
n 



noting the fact that since, given that Y ~ /o, Y and Z are independent, Vy° z (Yi) 



(3) 



Ef0 ( Zr^fr-YY ) = Mn^l 



°T = Mr, 



Note that if M n > 1 for all n, then /i* > and if M n -< 0(n), then /x* — ► 0. 
Similarly we can split the variance term as 

yfo 



Eft 



(4) 



+ E z 



(pMYj ~ VT? ~ n(Y - Mr ) 2 j 



From (2) we have Ep, (Eft nj(Yj - /j r ) 2 - n(Y - /x T ) 2 ) is free of z. So the 



'I* 

first term in the summation of (4) is 0. Easy calculations shows that the order the 

second term in that summation is nx 2 M yi ■ 

Thus 



(5) 



yfo 



Ef^jiYj-Yf 



Eft-, 



O 



Note that for M n > 2, V fo 
Y,f=in 3 (Y 3 -Yf 



'T,f=\n 3 {Y 3 -Yf 



Eft n 



n x (M n ) 

converges to 0. Hence, under 



/o. 



j=l '"3 / 

0, in probability, for M n > 1 and M n -< 0(n). 



Eft nj 
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Also we note that, under fo, 

,2 -.2 



Thus, for M n > 1, 

oo 
n=l 



> e 



< -E fo 
e 



Eft" 



2™ 



n x (M n y 



E^ 



> e < oo. 



Hence we conclude that, under fo, 



Y!t=\ n 3 ( Y j-YY 



"-Mr. 



0, a.s., for M n > 



1 and M n -< 0{n). Also we have [i* n — > 0, a.s. under the same set of conditions. 

0, 

a.s., for M„ > 1 and M„ -< O(n) 



e£%(^-?) 2 

Combining these two results we have that, under fo, — - -r-? 



□ 



S-2. Proofs of results associated with Section 6 of MB. 

S -2 . 1 . Proof of Lemma 6. 2. 

Proof. P(a > a n \Y n ) = \ 7 

L Ie n U] =1 \4> (^) dG n {a)dH n (6n) 
^, where H n (Q n ) is the joint distribution of n . 



Denote L(e„,y) = n"=i^ i ' — 



Y,- 



2 / / 

J<T£(b n ,a n ) JG 



L(6n,y)ciG„( C T)^n(e rt ) 



- J P(6„<a< C Tn)P(enG J Bn), 



where = {6*j G [— ci, ci], i = 1, . . . , n}, and 0* £ (— c\, c\), c\ > is a 
very small constant. 
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Now \Yi\ < a, \6*\ < ci => (Yi - 9*) 2 < (a + ci) 2 , Vi E?=i(*i " ^? < 
n(a+ci) 2 . Again, from properties of the Polya urn, H n (0 n ) > Y\i=i ^^^0(^1) ^ 

p(e nGj E n )>(^rF«. 

Now, the function ^ exp{— ^^lL| j s increasing on b n < a < a n for cr n < 
y/n(2a + c). Hence, 



(6) 



-n{a-\-c-iY 



D > 



-P(b n < a < o Ti 



a 



a + n 



HI 



For the numerator observe that for > o n , 
plies 



e 



N 



I I L(& n ,Y)dG n (o)dH n (@ r 

J & n J a>a„ 



a' 



< 



1 



^ (^F- This im " 



P(o > o n ) 



(V) 



(6) and (7) together implies that, 

P(o > o n \Y n ) < ^ > ^ 



Q(Cn) 



6 n w(o+ } 2 (a n)n . 

: e 2f, n 2 \ s „ J„ = A* , say. 



1 - P(b n <o<o n ) (<r n )» " 

Since 6„ < <7 n , we have 6 n n < (o n ) n ; also by the assumption of the lemma 
P(a > o n ) = 0(e n ), P{a < a n ) = 0(1 - e n ), so that P(b n < a < a n ) < 
P(a < a n ) implies P(b n < a < a n ) = 0(1 - e n ). Thus A* = O (e* ), where 



, 0^2 («+n)" 



-n ~ l- en - (a) 

S-2.2. Proof of Lemma 6.4. 
Proof. 



TTfi^ . This completes the proof. 



□ 



A 



(y-er 



(o + k) 
< H x f dG (8) 



e ^^dG (e) 



(8) 



H 



where H x = su V{yM < 



l- 

(y-B) 2 



i. Thus A n = O(l), and 



(9) 



a 



-A n = O 

a + n \a + n 



a 



Since a = 0(n w ); < u < 1, — )■ 0, and hence the proof follows. 



□ 
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S-2.3. Proof of Theorem 6. 1 . 
Proof. For Y n e S n , 

1 , (v 



E 



a + k) \ a + k 

i f f Jv 



j J q 4> (jyf) L ( @ n, Y n )dH(Q n )dG n (a) 



D 

J I + J 2 + J 3 



(10) 



where 

Jl = h U (^§) L ^ Y n )dH(Q n )dG n (a), 









(y-9i 




\ (T+k 




fy-0i 



J * = h Ir 2 (SO L(0 ™' Yn)dH(Q n )dG n (a), 

■h = h L, ^ l^t) L(&n, Y n )dH(Q n )dG n (a), 



Ri = {9i e [-a - c, a + c], a < a n }, 
R2 = {&i G [-a - c,a + c] c , a < a n }, 
R3 = {<? > cr n }. 

Then it follows from Lemma 4.1 of Mukhopadhyay and Bhattacharya (2012a) 
that 

(11) h <H l P{R^\Y n ) <H x e* n , 
where H x = sup {yM {j^t (0) } = \- 

Using Lemma 4.2 of Mukhopadhyay and Bhattacharya (2012a) we obtain 

(12) J 2 <H l P{R 2 \Y n )<H 1 B n . 



<13) Ji - KfeW (SrS) (1 " p <*i Y *> ~ p • 

where, for every y, /i*(y) £ (— a— c, a+c) and v n {y) G (0, a n ), applying GMVT. 

Let us choose e n and <j n in a way such that e* and £? ra converge to zero as 
n — > 00. Now we note that, in J\, the range of 6. L remains same for all n. It is the 
range of a that varies with n and the point v n (y) varies with n. This has the effect of 
varying /i* (y) since the kernel depends upon both Oi and a. This implies that ^x* (y) 
and u„(y) depend upon a n , so that /i* (jy) = {i(a n , y), v n {y) = ^(a n , y), such that 
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•F(0, y) = (note that v n (y) < a n and a n — > 0). We also assume that /i(0, y) = 
H*{y)- As before, we assume that /i and ^ are continuously differentiable at least 
once; indeed we can choose fi and & to be smooth functions such that n* n {y) = 
H(a n ,y), H(0,y) = n*(y),v n (y) =V(a n ,y),and&(0,y) = 0. 

Then the following Taylor series expansion is valid (letting x = a n ), 



1 



(&{x,y) + k) 



1 

V 



y- fi{0,y) 
k 



y - K x ,y) 

&(x,y) + k 
d 



+ x 



dx 



1 



y - K x >v) 



k 



x=x* 



X#(x,y) + k) \<P(x,y) 

where x* lies between and x. Noting that the terms are bounded for any y E 
where E denotes the real line, we have, 



(14) sup 

— oo<y<oo 



1 



v n {y) + k 



(v n (y) + k) 
Thus, for any y, we have that (t , n( * )+fc) 

It follows from (14) that 



1 

k' 



v n {y)+k 



y 



k 

y-v*(y) 



0(a n ) 



(15) 



sup 

-oo<j/<oo 



1 



y-t**(y) 
k 



0(B n 



Finally, since B„ and e* can be made to converge to 0, J2, J3 — > 0, J% 



y-v*(y) 

k 



J\dy 



. Also it holds that 
1 

y JRx (ff + *0 



1 

D 



a + k 

l-P(R 2 \Y n )-P(R 3 \Y n ) 



L(Q n ,Y n )dH(Q n )dG n (a)dy 



(16) 



1. 



y-v*(y) 

k 



, if we can show that 



Again, since we have proved above that J\ — > { 

for all n, J\ given by (13) is bounded above by an integrable function h(y) for 
every y, then it follows from the dominated convergence theorem (DCT) that 



(17) 



J\dy 



y - n*(y) 
k 



dy. 



To show that J\ is bounded above by an integrable function h, first note that J\ < 

Irk ( y-Vn(y) \ 

\ vn(y)+kj- 



We then define h to be the following. 

1 

k 



Kv) 



if yG 



otherwise, 
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where —l\ <C — a — c and £2 ~> a + c, and C* satisfies 

y 2 (v n (y) + k) 2 



(cry > sup 



{y-pr n {y)f 



where S = {n,y G [-4, ^] c , G (-0 - c,o + c),i? n (y) G (0,cj n ),cr n < 

if*}. The right hand side of the above inequality is clearly finite. Moreover, it is 

easy to see that J\ < \§ (^f-^jj^j < h(y) for every n, and almost all y, and that 
h is an integrable function. Hence, DCT holds. 
It follows from (16) and (17) that 



(18) 



k 



dy = 1, 



showing that fo(y) = \<t> y ) is a density. Indeed, we consider /o as the 

true data-generating density. 

Now note that, 



e yfEw{y I @n,cr) 



o 



a 



J a + n^ V( CJ + 



a + n 

where the first term is thanks to (9). Further note that, 



(19) 



1 



a + n 



i=l 



1 



[a + k) \a + k 



y 



n 



a + n 



+ k) \a + k 



x (Ji + J 2 + J 3 ). 



£ ( I 6n,cr) 

q: 



y n ) - ,/;,(//! 



\a + n J a + n 

+^—{Ji - Mv)) + ^—(J2 + J 3 ) 

a + n a + n 



Since / (y) < h we have 



E yfEw{y \@n,cr) 

a 



Y n - My) 



< o 



n 



, + 

a + n J a + n 
n 



\J2\ + \J3\) 



+ - 



a + n 



Ji-Mv)\ 
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It follows from (11), (12), and (15) and the fact that the orders of J\, J2, J3 are 
independent of y, that, for any Y n G S n , 



sup 

-oo<y<oo 



E fEw{y I @n,cr) 



Yn - My) 



O I + -^—{B n + e* n + a n ) 

a + n a + n 



proving the theorem. 



□ 



S-3. Proofs of results associated with Section 7 of MB. 

S-3.1. Proof of Lemma 7.2. 

E z TJe M L(Q Mn ,z,Y n )dH(Q Mn )dG n (a) 



Proof. P (a > a„\Y r 



Ez Jo°° /e M . L (&M n ,z, Y n )dH(Q Mn )dG n (a) 



N 



D , where H{®M n ) is the joint distribution of &M n and 

1 -IV 

p 2 lif.xt 



^5 1 .iv ( Y ^' 1 
L(Q Mri ,z,Y n ) = H—e 



where nj = : z t = j} (for any set A, #A denotes the cardinality of the set 
A). Let E* ={all Q\ G Q Un are in [-ci, ci]}. 
Then, 



L(@ Mn ,z, Y n )dH(@ Mn )dG n (a) 



Je 



> / / L(e A . /n ,z,r n )dF(e M JdG„(a) 



(20) 

Now note that 



\Y t \ < a, |#*| < ci => (y f - #*) 2 < (a + Cl ) 2 , 



^ E E (^"^) <n{a + Cl f. 

j = l t:z t =j 



Again, from the Poly a urn scheme we have H(&M n ) > Y\j=i a +M ^' 



where H = G (6)d6. 



which implies P (9 Mri E £*) > 
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Hence, for a n < yjn{a + ci), 

Again, in the same way as Lemma 6.2,N < M™ ^ 1 ^ n P(a > a n ). Since P(a > 
a n ) = 0(e n ) and P(b n < a < a n ) = 0(1 - e n ). Thus, 

P(a>a n \Y n ) = 0(e* Mn ), 

where 

e n ^n(a + ci) 2 \ (a + M„) M " 

e M„ = i exp 



1 - e n * V 2(6 n ) 2 ; a M»<» ' 
Hence, the proof follows. □ 

S-3.2. Proof of Lemma 7.3. 

PROOF. Clearly, E c ={at least one 6 k in likelihood is in [-a — c,a + c] c }. We 
have 

P(Z£Rl,e Mn £E c ,a<a n \Y n ) 

Y}™r 1] EL E zeVj L<a n Iw, L (®M n , z, Y n )dG n {a)dH{@ Mn ) 



E 2 / ff Je M ^ L(®M n ,z, Y n )dG n (o-)dH(@ M „ 

W„-l) 

>(M n -l) 



Ej=r } Eii E^ Ia<a„ Iw, ^( M„, z, F n )dG n (a)^(6^ 



Ej=i " ' E zeVj fa Ie Mn H®M n ,z, Y n )dG n {a)dH{Q Mn ) 
(22) 

Note that 
(23) 

L(e 1 ,., 2 ,r„) = n^- 3& -^) II . ' J« s( 

where nj = #{/ : z z = j} and Y,- = ^' : ^7 J Y ' . 

Let Hj(9j | G_-,M n ) be the conditional distribution of 9j given Q_jM n and 
H^j(@^jM n ) the joint distribution of Q_jM n , where Q-jM n = ©m„ \ 0j. Since 

<24) | e_, M J = + — ^— f £ V 

l=i,l¥=j 
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and a < a, we have in the denominator for each z € Vj, 



(25) 



> 



> 



a 



dHj{0j | Q-jmJ 
1 + ^ / -,(^-%) 2 



a + M, 



( 

- o exp - 



2a 2 



dGo(^) 



a + M n ~ n V2 



where 5 is the lower bound of the density of Go on [Yj Tj2,Yj H — r/2] (we 

assume that the density of Go is strictly positive in neighborhoods of Yj, for each 
3- 

Thus for each z € Vj we have, 



> 



(a + M n )n) /2 



n 

Jo Je. 



1 ^ _i r fn-yiy 

T (n-i)ll e 

i=i 

[I e 2 1 ' ^ dH-j{®_ jMn )dG n {a) 



(26) 



where 



a 



(a + M n )n' /2 



e~ 1/2 5 x ( n (j,z), (say), 



a 



n ■ 



1 ^ _i r f n-yi V 

71— 1) 1J. 

dH_j(@-j Mn )dG n (a) 



1=1 



(27) 



To obtain a lower bound for the numerator we note that for each z G Vj and 

j = l(l)M n , \Yj\ < a (since each \ Yi\ < a, I = 1, . . . , n) and 9j G [— a— c, a+cf. 
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This implies 



-exp(-n J (y j -9j) 2 /2a 2 ) 



< 



1 n 



1/2 



n 



1/2 
3 

A? 



a 



exp (— njC 2 /Aa 2 ) exp(— c 2 /4cr 2 ) 



< -^eM-c 2 /^ 2 n ), 

where A\ = sup^ ^ / exp ^— "^r^ \. It is easy to check that is free of n. 
Thus for each z EVj, 



< 



n 



L(Q Mn ,z, Y n )dG n (a)dH(9 Mn ) 

cr<cr n JWi 

^exp(-c 2 /4cr 2 ) 



o Je 



■jM„ 



i Mil _i v /^wp 2 

(n-1) 11 C 

Z=l 



(J 



n, (Y,-6, 



dH- j (Q-. jMn )dG n (<r) 



< A* exp(-c 2 /4<7 2 ) x( n (j,z). 
As a result, using (22), we see that 

P^G^^Mn eE c ,a<a n \Y n ) 

E<=r 1} Eii E^ L<. n /wj ^ejfc, z, r„)dG n (a)dF(e M j 



< 



< 



< 



ZfJi~ 1] E ze y 3 L fe Mn H®M n ,z, Y n )dG n (a)dH(e 
A\ exp(-c 2 /4q 2 ) x ggj^ ill E^ CnQ", 

>(M„-1) 



(M„-l) ^(M„-l) 



a; ex P (-cV4a 2 ) x E}=i~ ECr ; E 2e y, C»(j\ 



a+M„ e 1/2( ^ X Ej=l ) EzeV,- Cn(j, 

(M n -l)^exp(-c 2 /4a 2 ) 

-1/2(5 



(28) 
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proving the lemma. 

□ 

S-3.3. Proof of Lemma 7.4. 

Proof. Note that, 

P(Z e Rl@ Mn e E,a < a n \Y n ) 
T, zeRt Icr<cr n Je Mn eE I H®M n ,z, Y n )dG n (a)dH(Q Mn ) 
EJJq m L(Q Mn ,z,Y n )dG n (a)dH(e Mn ) 

Let Q- z = 6m„ \ @z- Let us assume, without loss generality, that {6\, . . . , 6^} 
is the set of #z's present in the likelihood with d being the number of such 0j's for 

z G 

Since 

I _E^" 1 Et:» t =j«-nO a _S J A l" 1 n J (?,-« 3 ) 2 

£(©M n ,^,^n) = — e xe ^ 

< — e 2„2 - 

- a n 

it follows that 

/ / L(e Mn ,z,Y n )dH(@ Mn )dG n (a) 

J®M n eE Jo 

< / / / -^e dH(G Mn )dG n (a) 

J 6ie[-a-c,a+c] J6_ 1Mb JO 17 

<Aj/ / dH(Q Mn )dG n {a) 

Je!e[-a-c,a+c] ie_iM„ 

(30) =^G ([-a-c,a + c])O(l-e n ), 



where A* = su P{(tG((W)} £ c ^ = e 2 - 

d l) = mi {zeRl} (Zf=\ Zf, t=j (Yt ~ Y 3 ?). 

Clearly, for each z G R\ each term in N is bounded above by 

iV*=( — J e ^xG ([-a-c,a|c])xO(l-e n ). 
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Hence 

(31) N < (M„ - l) n N* 

Let Cn^ = sup zeR * J2jti J2t:z t =j( Y t - Yj) 2 . Now, assuming that k n is a se- 
quence diverging to oo and denoting R*={6\ G \Y\ — k n ,Yi + k n ], . . . ,9d G 
[Yd — k n , Yd + k n ], rest 0j's are in (—00, 00), nk n < a < 2nk n }, 



D 



3=1 Z^:z t =jl-'t I 3) 
" 2^ 



xe " 2a 2 dH(Q Mn )dG n {a) 



> 



R" 



2^2 



x e 2a 2 dH{Q Mn )dG n {a) 



( 1 _E^iE t , (=j (y t -? j ) 2 ' 
> inf — e 2^ 

{z,o-e[nfc n ,2nfc n ]} \ (J n 



E-=i%(^-^) 2 

x / e " 2a 2 ~ x dH{@ Mn )dG n (a) 

IR* 



L 



> l —J ' 8n2fc ™ x y 



e 8" 2 ^ x / x dH(@ Mn )dG n (a) 



1 \™ gj 2) „ ! /• 

' 2nk~) 6 ^xe'^x J dH(Q Mn )dG n (<r), 
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because a > nk n => e 2^ > e 2™ . Thus, 

,(2) d 



D > 



/ 1 \n_ cX> « 

n .7=1 



(32) x(^) xoy, 

assuming that n dG n (a) = 0(e n ) as well. 

Inequalities (31) and (32) imply that ^ is of the order 

e ^ xG ([-a-c,a + c})xO(l-e n ) 

C^> 7a 

(33) 

where rif=i ^oQ^j - Ki, % + M) 1 as n ~+ 00 ■ 

As shown in Section 2 of MB, Y,j=i Yltz t =j( Y t ~Yj) 2 converges to a\ almost 
surely. Thus for large n the possible values of Ylj=i Efat=j'(^* ~~ ^i) 2 w ^ ^ e 
close to a\ almost surely. From this we can say that for large n, it holds, almost 

surely, that Cn ~ Cr? ~ C n . We now investigate the appropriate order of C n 

such that 

(34) 

"^G ([-a- C ,a + c])O(l-e n )< — e ^0(e n ) 



holds for large n. 
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Taking logarithm of both sides of (34) yields 



n log 
< nlog 

C n 



( 1 ) 


C n 


\bnVnJ 


2b n a 2 


( 1 ) 


C n 


\2nk n J 


8n 2 k 2 


1 


1 ) 


Lu n°n 


8n 2 k 2 ) 



+ lo 



l-e r 



+ log (H 



> nlog 



1 



+0 (lo, 



nlog 
1 - e n 



1 



+ log (flo) 



7? 



lo g(2^l +log(2nfc n ) 



+ lo 



1— e„ 



+ 



log(tfo) 



7? 



log 



o ( h lo? 



262 a 2 Sn 2 k 2 



l-er 



(35) ^ C„> 



Thus, (34) holds for C n given by (17) of MB. Hence, (18) of MB holds under 
the additional assumption (17) of MB. 

□ 



S-3.4. Proof of Theorem 7.1. 
Proof. 

y 



E 



1 



a + k) \a + k 
Ez I @Mn I ^ (^}) H®M nJ z, Y n )dH{Q Mn )dG n {a) 



EJ 0M f a L(e Mn ,z,Y n )dH(@ Mn )dG n (a) 



(36) 



N 



where 1(9^,2, V n ) = ^=1 e 
n j = #{k:z k = j},Y j = ±Ef. Zt = j Yt 



2^2 



3 ^ J J 

) likelihood of 6 



M„, 



To simplify the calculations we can split the set of all of z's in to i?* and (Rl) c ; 
the cardinality of the set of all z- vectors satisfying these conditions are (M n — l) n 
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and M™ - (M n - l) n , respectively. Denote h = {&M„ G E c ,a < a n }, h = 
{G Mn e E,a < a n ], I 3 = {9i G [-a - c,a + c] c ,a < a n }, I 4 = {6i G 
[—a — c,a + c],a < a n }, I5 = {a > a n }, where P has been defined in Section 7 
of MB. Note that 



{^n/i}u{^n/ 2 } = Rln{a<a n } 
{{RiynhjuKRiynh} = (Rirn{a<a n } 
({fl;n/i}u{i2;n/2})u({(^) c nJ 3 }u{(i?;) c n/4}) = {a < a n } 



We write 
(37) £ 



(a + k) \a + k 



Si + S 2 + S 3 + S 4 + S b 



y-8i 

a+k 

y-6i 

cr+fe 



where, 

^2 = 5 Ejq Ii 2 (aTFj^ 
S 3= h ^(Rt) c Il 3 (^Tfc) 

S 5 = hEj h ^4> (0) L(e Mn ,z,Y n )dH(e Mn )dG n (a 



a+k 

y-Oi 

a+k 



L(Q Mn ,z,Y n )dH(e Mn )dG n (a), 
L(Q Mn ,z, Y n )dH{& Mn )dG n {a), 
L(e Mn ,z, Y n )dH(e Mn )dG n (a), 
L{e Mn ,z,Y n )dH(e Mn )dG n (a), 



Also let 

Pi = P(Z G R{, Q Mn eE c ,a< a n \Y n ), 

P 2 = P(Z G Rt, Q Mn G P,cr < a n \Y n ), 

P 3 = P(Z G (Rt) c , 9 t e[-a-c,a + c] c , a < a n \Y n ), 

P 4 = P(Ze {Rlf^i G [-a-c,a + c],a< a n \Y n ), 

P 5 = P(<7 > ffnl^n). 

Let iff = sup {yfijia} { (^J) } = p Then the upper bounds of the 

terms Si , . . . , S5 are given as follows. 

(38) Si < P X *P(Z G Rl, e Mn £E c ,a< a n \Y n ) < H*(M n - l)B Mn , 
from Lemma 7.3 of MB. 



(39) 

S 2 < H{P(Z G Rl, @ Mn eE,a< a n \Y n ) <H{[1 



M r , 



a + M n 



a 
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from Lemma 7.4 of MB. 



(40) S 3 < H*P(Z G (P*) c , 6i G [-a - c, a + c] c , a < a n \ Y n ) < H*B Mn 
from Lemma 7.5 of MB. 

(41) S 5 < H{P{Z G (Rl) c ,a > a n \ Y n ) < H{<* Mn , 
from Lemma 7.2 of MB. 



1 f 1 Jv 



D J l4 (a + ky \a + k 

1 1 fy-0* n (y) 



L(e Mn ,z,Y n )dH(@ Mn )dG n (a) 



D(a*(y) + ky \a* n {y) + k / 
(42) x J2 I f L(e Mn ,z,Y n )dH(G Mn )dG n (a) 

^ rr .^„J J 6i.£\—a—c.a+c] JO 



i Jy-e* n { y ) 



(y) + ky \a* n {y) + k / 
xP{Ze (Rl) c ,6i G [-a-c,a + c],a<a n \ Y n ) 

1 Jy-o* n {y) 



(y) + ky \a*(y) + k 



(1-P 1 -P 2 -P 3 -P 5 ), 



(43) 



where (42) is obtained by using GMVT, 0* (y) G (-a - c,a + c), and cr*(y) G 
(0,0. 

The integration and summation can be interchanged since the number of terms un- 
der summation is finite for a particular value of n. 

Note that equations (38)— (4 1), and Pi, P2, P3, P5 converge to zero under proper 
conditions. In particular, Pi converges to if a n is chosen to be sufficiently small. 
Also b n can also cosen to be very small such that it satisfies b 2 n < a\ for all n. 

These choices get P3 to converge to and P5 converges to zero if y 12 — -< 



2(6n) 2 J (a)MnH^ 

P2 converges to zero if M n -< yfn, however, the form of the bound (18) given 
by Lemma 7.4 of MB is valid if (17) of MB holds. 
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Now note that Si + S 2 + S 3 + S 5 < H{ [Pi + P 2 + P 3 + P 5 ]. Since under the 
specified assumptions Pi, Pj, P3, P5 converge to 0, as n — > 00, the sum also goes 
to 0, as n — > 00. Thus in S4, the term (1 — Pi — P 2 — P 3 — P5) — > 1 as n — > 00. 



Uniform convergence of 



to 



y-e*(y) 

k 



can be proved in 



1 a ( yi-6u(y) 

exactly the same way using Taylor's series expansion as done in the case of the EW 
model. In particular, it holds that 



(44) sup 

— oo<y<oo 



1 



y 



{y) + ky \a*(y) + k 
We also conclude that for Y n G S n , 



1 



y 



'(y) 



0(a n ), 



E 



1 



(cr + k) \(T + k 



1 

V 



'(y) 



k 



= s 1 + s 2 + s 3 + s 5 



yi - Qn(y) 

{a*{y) + kr\a*(y) + k 

s 1 + s 2 + s 3 + s 5 - 1 



(1 - Pi - p 2 - p 3 - p 5 ) 



1 

k ( 



y-e*(y) 
k 



«(y) + kV \ a*{y) + k 



P 1 +P 2 + P 3 + P 5 



+ 



1 



yi - Kiy) 
«(y) + kV \<(y) + k 



1 

V 



y 



= 0({M n -l)B Mn ) + 0\ 1 



M r , 



Mr, 



k 

a + M r , 



a 



M„ 



+0 (B Mn ) + 0{e* Mn )+0(a n ) 



= O [MnB Mn + ( 1 - 
(45) 



a + M n 



a 



+ € M n + °n 



Noting that the terms involving y are bounded above by 1/k, it follows that 

E(f SB (y\e Mn ,cT)\Y n )-l 



sup 

-oo<jy<oo 



< sup 

— oo<y<oo 



E 



-A 



(cr + k) \<J + k 



y-Oi 



k 

Y 



y-0*{y) 

k 

1 



v-o*(y) 
k 



O M n B>M„ + 1 



1 

Mr 



a + Mr, 



a 



+ e M n + a r 



(46) 

proving the theorem. 



□ 
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S-3.5. Proof of Theorem 7.6. 

Proof. Recall that J x = ^ J h ^0 (^=§) L{G n , Y n )dH{& n )dG n (a), 
and 

5 4 = wJi 4 ^^(^)^ze(RirH @ M n ,z,Y n )dH(Q Mn )dG n (a), where 
L>i and L>2 denote the normalizing constants of the posteriors corresponding to the 
EW and the SB models, respectively. 
Let L = max(M n , n). Then, 
| Ji — 54 1 



1 



Z>i (cr + k) \ a + k 



y 



xL(O n ,Y r 



i 



i 



y-Qj 

D 2 (a + k)' r \ a + k 



x L{@ Mn ,z,Y n ) 



y - Qi, n {y) 



dH(®L)dG n (o~ / 
x |P(/ 4 |r n )-P(( J Rl) c ,/ 4 |r„)| 



(47) 
(48) 



0. 



Step (47) follows using C/WVT, where the notation have the usual meanings, and 
step (48) follows because the first factor remains bounded and the second factor 
goes to zero (since P (Ii\Y n ) — > 1, and P ((i?*) c , I&\Y n ) — > 1). In other words, 
Ji and 54 converge to the same model. Hence, we must have //* (y) = 6* (y). □ 

S-4. Proof of results associated with Section 8 of MB. 

S-4.1. Proof of Lemma 8.2. 
Proof. Note that 



Var 



E 



1 



y 



[a + k) \a + k 
V-0i 



1 



(a + ky 

(49) = j' 1 + j' 2 + j' 3 , 



a + k 



E 



1 



(a + k) \a + k 



where 

J 'l = h J Rl (I >< L (®n, Y n )dH(Q n )dG n (a) 
J 2 = hJnjl x L(e ni Y n )dH(Q n )dG n (a) 
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J3 = h Ir 3 %n x L(@ n , Y n )dH(Q n )dG n (a) 

Clearly, 
(50) 
and 
(51) 



J 2 < H 3 x B n , 



■h < Hi x e* n , 



where H3 = sup{ y . CT } |£j n | < |. Abusing notation a bit, we re-define #3 = |. 

Denoting Pi = P (Pi|F n ) ,P 2 = P {R 2 \Y n ),P 3 = P (P 3 |^n), we con- 
centrate on the term 



J 



^[[ I £ n xL(G n ,Y n )dH(G n )dG n (a) 

U J J 9i£ [— a— c,a+c\ J cr<a n 



V - m n {y) 
{T n {y) + kY V r n (y) + k 



E 



[a + k) \a + k 



^[f f UxL(Qn,Y n )dH(& n )dG n (a) 

U J J 9iG[—a—c,a+c] J cr<a„ 



(52) 



applying GMVT, where, for every y, m n (y) £ (—a — c,a + c), and r n (y) G 

(0,<7„). 

Now we consider the following term: 
^// / UxL(Q n ,Y n )dH(Q n )dG n (a) 

U J J di£[—a—c,a+c] J a<a n 

L(@ n ,Y n )dH(e n )dG n (a) 



1 

" D 
-E 



i£[-a-c,a+c\ 



*<a n i<y + k) \a + k 



V-Oi 



(cr + k) \a + k 

-r" t" 
= J 1 +J 2 . 

(53) 



Y n ) P [9i £ [-a - c, a + c], a < a r . 
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For the part j'[ , we note that 

~ // / r ^c/ ) ( 1 ^)L(e n ,Y n )dH(@ n }dG n (a) 

D J Je % e[-a-c,a+c} Ja<a n \° + «) V ° + k ) 

(v n (y) + k)^ \v n (y) + k) ^ P2 ^ 

(54) 

From (13) and following Theorem 2.4 it follows that 
Thus, 



1 ^ ( y-V*n(y)\ r, , r> r> , , , r> n ,2] 



(i>n(z/) + fc) V y n(y) + ^ 

-o/ 2 + j 3 )(i-p2-p 3 ; 



) [(l-p 2 -p 3 )-(l-p 2 -p 3 ) 2 



(v n (y) + k) \v n {y) + kj 
+H 1 (P 2 + P 3 )(l - P 2 - P 3 ) 

(56) = o(p n + e ;), 

since (^OT 1 ) < P p 2 = 0(P n ) and P 3 = 0(£). 



S-4.2. Proof of Lemma 8.3. 



□ 
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PROOF. The covariance term can be written as, 

covij = J{ + J 2 + J3 , where 

J l = ^l ^nL(Q nt Y n )dH(e n )dG n (a), 

MnL{Q n , Y n )dH(Q n )dG n (a), 



J. 



T* 



1 

D 
1 

D 



R<3 



UCjnHBn, Y n )dH(e n )dG n (a), 



Ri = {8i G [—a — c,a + c],8j G [—a — c, a + c], a < cr n }, 
i? 2 = {at least one of 0j and 0j is in [— a — c, a + c] c , c < a n }, 
(57) 4 = {a > a n }. 



Also denote P x = P (R^Yn) ,P 2 = P[R 2 \Y n ) ,P 3 = P ( R 3 \Y n ) . Note 
that, 



(58) 
and 



J3 < HiP 3 < Hfe* n 



J2 < H 3 P 2 



< H 3 P [Oi G [a - c, a + c] c , cr < cr r> 



+H 3 P [6j e[a-c,a + c] c , a < a n 



(59) 



Consider the term 

Jl - 



^ [ ^jnL(e n ,Y n )dH(e n )dG n (a) 

4& / Z jn L(e n ,Y n )dH(e n )dG n (a) 
u Jr' 



(60) 

where & 



, fin G (-a- 



c, a + c), and r)i n G (0, a n ). Equation (60) is obtained by applying GMVT. We 
will study convergence of the term 

f R[ £ jn L(@ n , Y n )dH{Q n )dG n {a). 
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D 



l 



H jn L(O n ,Y n )dH(O n )dG n (a) 



y - <P2n(y) 
1 

h + h + Li + 



'1 - P, - P~ 



y - ^2n(y) 



(V2n(y) + k) \mn(y) + k 



1-^2-^3 



1 - Po ~ P, 



where 



1 



1 



(61) 



< fTiP(fli) 

< ^iP(P 2 ) 

< HiB n , 



a + k 



L(Q n ,Y n )dH(O n )dG n (a) 



From (61), the equations (58), (59) (the latter two showing that P 2 = 0(B n ), 
Pg = 0(4)), and £* n = O(l), it follows that Jf = O (P„ + e*). Finally, we 
have, 

covij = O (B n + e* ) 



S-5. Proofs of results associated with Section 9 of MB. 



□ 



S-5.1. Proof of Lemma 9.2. 

i (y 



Var 
E 



(a + k) T \a + k 
1 / !J 



(a + k) \a + k 



Y 

E 



y-Qi 

(a + k) T \a + k 



As in (45) we begin with splitting up the range of z and the range of integration 
of @M n an( i a i n tne following way: 



E 



1 



(a + k) \a + k 
si + S 2 * + S* 3 + S* 4 + S|, 



E 



1 



(a + k) \a + k 



(62) 
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where S* has same ranges of z, @m„ an d 0" as Si in Theorem 7.1 of MB only the 
integrand of the former is now replaced by 

l a ( y-6i \ _ E I i 



CT+fc 



, that is 



{o+k)^ \ a+k J \ (cr+fc) 

St = h Y.B* I h ( 2 Mn H®M n ,z, Y n )dH(@ Mn )dG n (a), 
S* 2 = h E*j 4 ( 2 Mn L(e Mn ,z, Y n )dH(e Mn )dG n (a), 
S* 3 = h E W 4 C 2 Mn H@M n ,z, Y n )dH(@ Mn )dG n (a), 
SI = hY. {Rir f l4 C 2 Mn L(e Mn ,z,Y n )dH(@ Mn )dG n (a), 
S* 5 = ^E z LC 2 M n L( @ M n ,z,Y n )dH(e Mn )dG n (a). 



y-Qj 



E 



(cr+fc)^ \a+k J I (a+k) 

in the same way as in equations (38)— (41) it follows that 



y-e, 

CT+fc 



§. Then 



SI < (H*) 2 P(Z e Rl,e Mn eE c ,a<a n \ Y n ) < (H*) 2 (M n - l)B Mn , 
(63) 



(64) 



S* < {H* 2 ) 2 P (Z e Rl, Q Mn eE,a<a n \ Y n ) 



< (H* 2 ) 2 [l 



1 \ n fa + M r , 



a 



M„ 



S* 3 < (H* 2 yP (Z G (Rl) c , 9i £[-a-c,a + c] c , a<a n \Y n )< {H$) z B Mn 
(65) 



S 5 * < (H*yP (Z e (Rl) c , 9i e[-a-c,a + c],a<a n \Y n )< (H*) 2 el In . 
(66) 



St = Jjf [ [ ( 2 Mn L(Q Mn ,z,Y n )dH(e Mn )dG r 



1 



E 



1 



y-Oi 



Y n 



(67) 



D 



{(K(y) + k)}^ \<jl{y) + k) \{a + k) T \a + k 

(M n L(®M n ,z, Y n )dH{Q Mn )dG n {a). 



Q-iM n J Oi£[-a-c,a+c} Ja<cr n 
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Let R' ={9i £ [—a — c, a + c], rest 0;'s are in (—00, 00), a < a n }. Then we 
consider the following: 



h I (M n L(e Mn ,z,Y n )dH(@ Mn )dG n (a 
u Jr' 

y-0 



1 



1 



D J R > (a + k) \a + k 

- E {^Tk)^{jri 

S1+S2, say. 



L(@ Mn ,z,Y n )dH(@ Mn )dG n (a) 
Y n )P{0ie [-a - c,a + c],a < a n \Y n ) 



(68) 



The terms S 1 and S 2 can be dealt with in the same way as J 1 and J 2 were 
handled in the corresponding EW case and it can be shown that 



(69) 



SI = Q\ MB Mn + 1 



1 \ n / a + M n 



Mr, 



a 



Mn 



+ e 



Mn 



Thus, Yti S t = (MB Mn + (l-^Y {^) M - + e* Mn ). Hence, the 
lemma follows. 



S-5.2. Proof of Lemma 9.3. Let 



fi(y,0i,6j,(T) 





(y-6i\ 


(<7+fc)^ 


\<j+k ) 



E 



1 





The covariance term is given by 



(cr+fe)^ \ <T+k 

Y 



= Ethiy^O^a) Y^j 

= E / / fi(y,O i ,O j ,a)L(Q Mn ,z,Y n )dG n (a)dH(Q Mn ), 



(70) 



We begin by bounding the following term 

Pi = ^E/ 00 / fi(y,0 t ,9 j ,a)L(Q Mn ,z,Y n )dG n (a)dH(e 



< (H* 2 ) 2 P(a>a n \Y n ) 
(71) < (H* 2 ) 2 e* Mn , 
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using Lemma 7.2 of MB, where 



#2 - su P{yA^} 



y-Oi 

a+k 



< §. In fact, abus- 



ing notation a bit, we re-define H 2 = |. 



When a < a n , for simplifying calculations, we split the range of z as {R[ n 
U {i^ n (R* 2 ) c } U {(i^) c n RQ U {(i^) c n (i?^) c }, where R* = {z: no 
Zk = jj. The respective cardinalities are (M n - 2) n , (M n - l) n - (M n - 2) n , 
(M n - l) n - (M n - 2) n and M™ - 2(M n - 1)" + (M n - 2)™. 

Now consider the sum £ R * nR * Jq^ fi(y,e i ,9 j ,a)L(@ Mn , z,Y n )dG n (a)dH(@ Mn ). 
In this sum none of Oi and 0j has been represented. Splitting the range of integration 
into /1U/2, we obtain the following bounds: 

= h E / fi(y,0 l ,9 j ,a)L(Q Mn ,z,Y n )dG n (a)dH(e Mn ) 

< (H* 2 ) 2 P (ZeRln R* 2 , Q Mn eE c ,a< a n \Y n ) 

< (H 2 *) 2 (M n -2)B Mn 



(72) 
and 



= ^ E / hi3lM,(r)L{Q Mn ,z,Y n )dG n {G)dH{Q Mn ) 
< (H*) 2 P (Z G Rl n i^, e Mn £E,a< a n \Y n ) 



(73) < (# 2 *) 2 (l 



M„ 



a + M n \ Mn 



a 



We follow the same procedure to obtain bounds for the ranges Rl n (R 2 ) c and 

{Rl) c nR* 2 . 



ft = \ }Z I h(vA,Oj,<T)L(Q Mn ,z,Y n )dG n (a)dH(Q Mn ) 

zeRln(R* 2 ) c h 

(74) < (H* 2 ) 2 (M n -l)B Mn . 



and 



ft = ^ E / /l(y,^,%^)L(G Mn ,^yn)dG' n ( ( 7)dF(GM n 



^e(-Rj;) c n^ 
(75) < (# 2 *) 2 (M n - 1)£ M| 
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As = \ I fi(y,OhOj,v)L(&M n ,z,Y n )dG n (a)dH(& Mn ) 



(76) < (l-^) _(i 

and 



M„ / V M n \ a 



(77) < (^)2((i__) _(i 



1 \" / 2 \"\ /a + M^ Mn 



M n J V M nJ J \ <« 
Inequalities (72)-(77) associated with /?2 — @7 yield 



1 /" CTn f 

n E / / /i(l/,ft,e i ,a)i:(eji tfB ^,Y- n )dG n ( < r)dff(e J i tfn ) 

1 /" CTn /" 

+ n J2 h{y,eiA,°) L (®M n ,z,Y n )dG n {(T)dH(QM n ) 

1 /" CTn /" 

+ n J2 h{y,eiA,°) L (®M n ,z,Y n )dG n {(T)dH{QM n ) 



ze(Rl) c nR^ 



M„,y V M„.J J \ a 

o j :ui7,,, + (1 



1 \ n /a + M n x M " 



M n J\ a 



(78) 



Now we concentrate on the part z G {R\) c H (-Rl) - We split the range of inte- 
gration of Ga/ u and er as the union of /^{Either 9{ or 6j are in [—a — c, a + c] c , 
cr € (0, (J n )} and /^{Both 0,; and 9j are in [—a — c, a + c] , a G (0, cr ra )}. We then 
have 



ft = ^ / h{yA,03,°)L(®M n ,z,Y n )dG n ((T)dH(®M n ) 

(79) < {H{fB Mn . 
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- y 

1 



D 



(<7in{y) + k) \a in {y) + k 



fi(y, 9 h 6j,a)L{Q Mn ,z, Y n )dG n (a)dH(Q Mn ) 
y - in {y) 



E 



l 



y 



1 



a + k) \a + k 
1 



1 

x — x 
D 



E 



(a + k) T V o + k 
V-Oj 



(a + k) \a + k 



x L(e Mn ,z,Y n )dG n (a)dH(e Mn ), 



(80) 



applying where, for each y, 0i n (y) G (—a — c,a + c), and ai„,(y) G 

(0,ff n ). 

Now applying GMVT to the second factor of (80) we get 



- T 

D 



ze(/?*) c n(i?*) c J 2 



1 



(cr + A;) V o" + 
- 1 n 



L(Q Mn ,z, Y n )dG n (a)dH(Q Mn ) 

y-oi 



+ k) \a + k 



Y n 



(cr + k) \a + k 

(ojn(y) + A;) Vainly) + k J \(<7 + 
x P(Z6 (^) c n (i^) c , 0i G (-o-c,o + c), 

G [—a — c, a + c], a < a n \Y 

(81) 



where the symbols have the same meanings as in (80). 

Letting (3* stand for the posterior probabilities associated with ft; i = 1, . . . , 9, 
note that 

PI = P{Z G (i?l) c n (^) c ,^ G [-a-c,a + c], 
% G [-a - c, a + c],a < a n \Y n ) 



i=i 
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where, for each i = 1, . . . , 8, 0(/3*) = O(Pi). Following the same method for 
obtaining (69) it can be shown that 

,82) ft = U„„ + (l 



Thus, for any = 1, . . . , M n ; % / j, the order of the covariance term can be 
summarized from (71), (78), (79) and (82) as, 

(83) r = *, = o (uB Mn + (i - (^) M " + ek ) • 

The order of £^ V. Co, (*=|) , ^ (0) |y B ) = 

M " ( ^~ 1} Taking M n to be sufficiently large it can be ensured that M " ( ^~ 1} = 
1 and hence the lemma follows. 

S-6. Proofs of results associated with Section 12 of MB. 

S-6.1. Proof of Lemma 12.1. 

PROOF. The proof will follow in the same way as in Lemma 6.2. The form of 
the likelihood now is 

j r^Ti V^P ~ ^ij ^ 

(84) L (6 n , z, Y) = —^e~ i=1 ^ 

Note that |^-| < a and |%| < c x implies that £? =1 £? =1 (y<3 2 ~fe )2 = np (a + 
ci) 2 . Also note that 

dH{@ n ) 



> 



B-i J8iE[— a— c,a+c]P 
a 



(85) = (-^_) f/J*\ 

These observations with the same calculations as associated with Lemma 6.2 
yields the required result. 

□ 
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S-6.2. Splitting the integral J@ L(Q n ,Y n )dH(@ n )dG n (a). 

L(e n ,Y n )dH(e n )dG n (a) 
r [ [ L(Q n ,Y n )dH(@ n )dG n (a) 

JO J J 9 ll e[-a-c,a+c] c 

+ r [ [ L(e n ,Y n )dH(Q n )dG n (a) 

JO J J One[-a-c,a+c] 

[ [ L(Q n ,Y n )dH(Q n )dG n (a) 

Jo J J ene[-a-c,a+c] c 

+ E r I I L(&n,Y n )dH(@ n )dG n (a) 
j=2 Jo J Jw tj 

+ r [ [ L(@ n ,Y n )dH(e n )dG n (a) 

Jo J J a— c,a+e] J 9i p £[—a—c,a+c] 



L(Q n ,Y n )dH(Q n )dG n (a) 

o Je x eE c 

(86) + [ L{e n ,Y n )dH(G n )dG n (a), 
Jo JdieE 

where Wij={0n G [—a — c, a + c], . . . , Oi(j-i) & [—a — c, a + c],9ij G [—a — 
c, a + c] c , rest 6>j's are in W}, j = 2, . . . , n\ E={6u G [—a — c, a + c], V/, rest #;'s 
are in K p }; Sft representing the real line. 

S-6.3. Proof of Lemma 12.2. 

PROOF. The proof follows in the same way as that of Lemma 6.3. 
Note that 

P (6 n e E c ,a < a n \ Y n ) 

IJ 4 l6[ - a - c , a+c] c L(9 B , Y n )dH(@ n )dG n (a) 



JJ en L(e n ,Y n )dH(e n )dG n (a) 

+ / ff / 0B L(e n ,i%Odff(e n )dG n (<o 

(87) 

In the same way as in Lemma 6.3 it can be shown that each of the p component 
probabilities is of the order B n = . Hence, it is easy to see that 

P (O n £ E c ,a < a n \ Y n ) = O ( P B n ) . 
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□ 

S-7. Splitting of f? n J B L (® Mn , z, Y n ) dH(& n )dG n (a) in the case 
of the "large p small n problem" associated with the SB model. As before, for 
z G R\, denote by d the number of 0fs present in likelihood, Z , the set of 9fs that 
are present in the likelihood, and 9±, . . . , 9 a, the 0fs present in the likelihood. Here 
also for z G R\, we split the integral J° n J 0m L (@M n ,z, Y n ) dH(@ n )dG n (a) 
as 

r<? n r 

/ / L(e Mn ,z,Y n )dH(e n )dG n (a) 
Jo Je Mn 

= nil L(@ Mn ,z,Y n )dH(e Mn )dG n (a) 

JO J J 8 ii 6[- a- c,a+c] 
d p 

EE/7 / M0M„,*,yn)d#(eMjdG n (c7) 
i=li=1 J y ^ 

T" / L(@ Mn ,z,Y n )dH(@ Mn )dG n (a) 

JO JOMr,^ 



3 

+ 



- n 



L (6 MnI z, y n ) dH(& Mn )dG n (a) 

On 



T" / L(@ Mn ,z,Y n )dH(@ Mn )dG n (a), 
Jo ie M „£E 



(88) 



where E ={all 6>;'s in likelihood are in [— a — c, a + c], rest are in W}, Wji={9ji G 
[—a — c, a + c], . . ., G [—a — c, a + c], G [—a — c, a + c] c , rest 6^'s are 

in ft*}. 

S-8. Proofs of results associated with Section 13 of MB. 

S-8.1. Proof of Theorem 13.1. 

PROOF. Let Y n G S n , as before. Now note that 

1 Jv-i 



E 



Y n 



a + k) \a + k 
1 jJ Q <f>(^^jL(®n,Y n )dH(@ n )dG n (o-) 



D JaJe n + k 
(89) < Hi 
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where R x = sup {yM { ^0 (^j) } = \. As a result, 



1 sr^ E ( i f y-Qj 
a + n ~{ \{cr + k) \a + k 

a + n 

(90) -> 0. 

Now consider 

if r r i (»-vn) 2 



n / / / r^n e L(e n , r n )dG (e„+i)dfl-(e n )dG n (<7) 

V Je n+1 Je n J a (cr + fcj 



= 1 / / / ^-yve L(6 n , F n )dG o (0 n+ i)diT(e n )dG n (<7) 

(»- g n+l) 



x+1 

^2 



^ / If r-^T\ e J ^°^L(Qn,Y n )dG (e n+ i)dH(@ n )dG n (a) 
Je n+1 Je n Ja>a n (? + k) 



D 

= Wi + W 2 (say). 
(91) 



1 (y-e n+ i) 2 

J- — — n — 



W 2 = ±- I II T^-rre L(e n ,Y n )dG (9 n+1 )dH(& n )dG n (a) 

Thus, by Lemma 6.2 of MB, 
(92) W 2 = 0(e* n ). 

As regards W\, an application of GMVT yields, 

1 f f f 1 - (y-^+if 



W 1 = i / / / ^yre L ( 6n , r n )dG o (0„+i)d#(e n )dG n (<r) 



(y-e n +i? 



I < w YTT\ e 2 «W +fc > 2 dGo(0 n+ i) x P(<r < <7 n |y n ) , 

7e n+1 + fc) 



DCT ensures that 



f ( _ (y- e n+i) 2 

K(y) + fc) 



(93) -> / -e dG ((9„+i). 
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It then follows from (93) and the fact that P (a < a n \Y n ) -> 1, that 

f 1 (v-o n +if 

(94) / -e ^dG (9 n+1 ). 

J6 n +i K 

Finally, (92) and (94) guarantee Theorem 13.1. □ 
S-8.2. Proof of Lemma 13.2. 
Proof. 

P(Z G Rl,0 Mn €E,a< a n \Y n ) = P{Z G R\, @ Mn G E\Y n ) 

(95) -P(ZeRt@ Mn eE,a>a n \Y n ) 

We first obtain a lower bound for P(Z G i?*, ©a/„ G i3|y n )- 
P(ZeRl,G Mn £E\Y n ) 

Z z eRl L<« n f e ^ E fL(Q Mn ,z,Y n )dG n (*)dH(0 Mn ) 
E z L Ie Mn L(&M n ,z, Y n )dG n (a)dH(Q Mn ) 



(96) 



where N = J L(Q Mn ,z, Y n )dG n (a)dH(Q M J and 

D = I L(@ Mn ,z, Y n )dG n {a)dH{Q Mn ). 

Denote E* = {9j G [—a — c, a + c] n — fc n , j/j + k n ] , j = 1, . . . , d; 9j G 
(-00,00), j = d+1,... ,M n } and Ej = {9j G [-a-c,a+c]n[yj-k n ,yj + k n ]} 
for j = 1, . . . , d. Let fc„ be a sequence of constants such that k n — > 00. 
Note that, 

iV > / / — e 2a 2 



: I'M, »:f 

xe 2a 2 dH(@ Mn )dG n {a) 



( 1 \ n c " 1 / a V 

£ (55- j - * j - n <*> x 

j=l 

(97) 

assuming n dG n (o) = 0(e n ). 
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Since k n — > oo, as n — > oo, Go (Ej) ~ Go ((— oo, oo)) = 1 and 
/ 1 \ n _ cl 2) 1 / a \ Mn 



Now, 

' n ' 



e (99) 2a 2 2rr* < I > ( ) r -l 

for < cr < oo. This implies 



(n \ 2 n 
GPV 

Since G r ^ ~ cffl ~ G ra for large n, let us obtain the condition under which 



n \ 2 



<101) UsJ "'""^H^l «-» 

Let /c„ = r n G n , where r n — >■ and r n G n — >■ oo. 



Then, 



2nk n J V G n 



« -nlog(C„) + 5log(C„)- '" 



2 



2 toV n/ 8n 2 r 2 G 2 

Tl Tl 

> - log(n) - - + nlog(2n) + nlog(r„) - O (log(e n )) 



-log(G n ) + 



2 - - 8n 2 r 2 G 2 

Tl Tl 

< -- log(n) + - - nlog(2) - nlog(n) - nlog(r„) - O ( log ( - L 

4^ - log(C n ) + 



2 b — 8n 2 r 2 G 2 
< n ( - | log(n) - log(r n ) + X - - log(2) J - O flog f i- 



(102) 
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In the R.H.S of the equality (102) the term (h — log(2)) is a constant and e n 
is chosen independently. So, we can choose r n sufficiently small such that term 
n (— log(ra) — log(r n )) dominates the other terms. 

Let C n = where s > 2. 

Then k n = r n C n = 3 _\ 2 — > oo, for r n going to zero at a sufficiently fast rate. 

Also, n 2 r 2 n C n = ^4 = -> oo, for s > 2. 

And, 

(103) - log(Cn) = — — log(r n ) - nlog(n) < -nlog(r n ) - nlog(n). 

So, for C n = O ( -^-1 J ; s > 2, if r n is fixed to be sufficiently small such that 

for large n, 8n 2 r 1 2 C a ~ 0, and O (log f^j) ~< -nlog(r n ) - nlog(n). Then, as 
n — >■ oo, (102) holds, and 

Hence, it follows that 

c (2) 

(104) a (^H 1 ^)"' 

Now we obtain an upper bound for P (Z G i?*, 0Af n £ E,a > a n \Y n ). 
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P (Z € Rl, & Mn € E,a > a n \ Y n ) 

g*egj IZ Ib^e I L(&M n ,z, Y n )dG n (a)dH(Q Mn ) 

E z L Ie Mn H®M n ,z, Y n )dG n (a)dH(Q Mn ) 
ZzeRj N 

(105) < 

where N = Q J QzeE J L(Q Mn ,z, Y n )dG n (a)dH(Q M J and 
D = fJ QMn L(e Mn ,z,Y n )dG n (a)dH(e Mn ). 

Note that, in the same as we have obtained equation (99), it can shown that 

(106) N<(^-) e-2 xO(e n ). 



D > / L(@ Mn ,z,Y n )dG n (a)dH(e Mn ) 

J <J>(T n J @M n 

j=i 

( 1 \ n - c " ' i / a \ Mn 

<107) S (ssj e ^ x ^Utm^) x0(e ">' 

since for large n, Go (.Ej) ~ Go ((— oo, oo)) = 1. 
Since C [ n ] ~ G^ 2) ~ G n it follows that 

nan < fe) 2e ' ix0(f ") 

V D / 1 \ n %V i / \ M ™ 

^ 2 (i) e ^ x e"^ (^) x 0(£n) 

Choose C n such that 

<109) (i) v * x0 < ^ »> s (<l) ,e ~ , • 
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which is exactly the same condition as in the last case of the lower bounds. So, as 

n — > oo, 



(110) P(ZeRl,e Mn eE,a>a n ) < 



Hence, 



a + M r , 



a 



M„ 



1 

~Mr, 



xO(e r 



P(ZeRl,e Mn eE,a<a n \Y n ) 

P(Ze R$,e Mn eE,a< a n \Y n ) - P (Z € Rl, S Mn eE,a> a n \Y r 



> 



a 



a + M n 



a + Mr. 



a 



Mn 



x 0(e n ) 



1 



1 

M~n 



(111) 

We must have 



a 



(112) 



a + M, 
& 0{e n )< 



Mr, 



> 



a 



a + M n 
a 

2M„ 



M n 



x 0(e n ) 



a + M n 



Using L' Hospital's rule it can be shown that 



a+M n 



2M n 



1, if M n = n b , 



a = n w , uj > 0, b > and uj — b > b. Since e n — > 0, for large n, (1 12) holds and 
does not contradict the assumptions regarding e n . 



Summing up all the results we have, 
(113) P(Z£Rl,e Mn eE,a<a n \Y n )> 
forC n = o(j-j),s>2. 



a 



a + M ri 



M n 



Mr, 



□ 
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S-8.3. Proof of Theorem 13.3. Consider the integral 



r 1 



+ k) \a + k 



L(0 Mn ,z,Y n )dH(e Mn )dG n (a) 



a 



a + M T 



f r 1 , (y-Qi\ 



xL(@ Mn ,z, Y n )dG (ei)dH^(Q. iMn )dG n (a) 
a + M n -l . = ^. Je^eE., Jo (a + kf\a 



+ k 



xL(0_ iMn ,z, Y n )dH_i{G-iM n )dG n (a), 



(114) 



using the Poly a urn representation of H(@M n ), given by 
(115) 

M„ 



H(QM n ] 



xF_i(G_i Mn ), 



where 0-iM n = @M n \ Oi and H-i(@- iMn ) is the joint distribution of 6_jM n and 
E'-i is the set E excluding 0j. 

Let D = Y J J a Jq m H®M n ,z, Y n )dG n (a)dH(@ Mn ). Note that for z G A*, 
0j is not present in likelihood and hence {Om„ £ E} = {— oo < 6>j < oo} n 
{6_ iM „ G Then, 



*Je Mn eEJo (o- + k) \a + k 



zeRl JU -iM n 



[ - 



xL(@ Mn ,z, Y^dGoiejdH-iie-wJdGnia) 

Jejo (v + k) \<J + k) 
xL(Q Mn ,z, Y n )dH-i(e- iMn )dG n (o-) 



;<f> 



y 



dG o (0i) 



(116) 



+ k) T \a*(y) + k / 

xP Mn -i (Z G R\,e- iMn G E_ h a< u n \Y n ) , 



where Pm u - 1 ( ■ I Y n ) is the posterior probability when the mixture model has M n 
1 components. 
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It can shown that exactly under the same conditions as Lemma 13.2, 
Pu n -i (Z 6 R\,Q-iM n £ E_i,a < a n \Y n ) has the same lower bound with only 
M n replaced with M n — 1. Using L' Hospital's rule it can be easily shown that 
Pm u -i {Z G R*,Q-iM n e E*,a < a n \Y n ) also converges to 1. 

DCT ensures that 



1 J V-Oi \ ,„ , n s f 1 



dG (9i)^ / -e ^dG ( 



(117) 



'I J) 



where the symbols have the same meanings as in (1 17). 
Again, 

1 1 Mn f f an 1 



ID E E 



a + Mn _ 1D ^^j e _^ E _ Jo {a + k) y a + k 

xL(@„ iMn , z,Y n )dH-i(G- iMn )dG n ((T) 

< Hi x 

a + M n - 1 



M, 

X 



i r rcrn 

E n E / / ^(©-iWn^.^dff-iCe-iM-JdGnW 



. „ M n -1 

< f/i X , 

a + M n - 1 ' 

(118) 



where = sup^j^^ } = \. Note that for a y Q(M n ), 



1 and i 1 "- 1 , -»• 0. 
a+M„— 1 



From (117) and (1 18) we conclude that for Y„ £ S n , 



Je Mn &EJo (<r + «) \(r + kj 

(119) -> ^ -e-^^dGoiOi), 



as n — )• oo. 
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S-9. Overview of asymptotic calculations associated with Section 14 of MB. 

It is easy to see that the upper bounds of the probabilites given in Lemmas 7.2-7.5 
remain the same for this modified model. For the modified SB model the likelihood 
function L(@m„ , z, Y n , II) is given by 



M n M n ., . /y._ fl .x2 



(120) L(Q Mn ,z,Y n ,Il) = l[iT^- 1 l[—e * — 

1=1 j=l " 



From the form (120) it is clear that given z, the posterior of II is independent of 
®M n - Hence, it is easy to see that, same calculations as in Lemma 7.2 yield the 
following bounds for the modified model: 

(Vn) z Jn i=1 



and 

exp ( -" ( "+ c 2 l)2 N ) / \ M n , M n 



a + M n 



Z JT1 i=l 



Hence, the upper bound for P (a > a n \Y n ) does not change. Similarly, the 
same argument shows that the upper bounds remain the same for all the proba- 
bilities except for P(Z G R\,QM n G E,a < a n \Y n ). For P(Z G R*,&M n G 
E,a < cr n \Y n ), the same calculations as in Lemma 3.3 show that 



1 



N < ( — ) e x G ([-a-c,a + c]) x 0(1 - e n ) 



>< E / u-r^u. 

zeR* Jn i=i 



(121) 
Similarly, 



D > x e 8n k « x e 2n x 



a 



2nk n J \ a + M n 

d 

i=i 

(122) x^ / JjTif+^dn. 

2 ^ n 1=1 



Mr, 
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Since 

\M n nt+Pi-1 



the upper bound remains the same as before. 

It can also be shown that E (js B (y | @m„ , n, a)j converges to \§ ( j~ 6 k ^ 
We will split the expectation in the same way as in Theorem 3.5 into Si , S2 , £3 , S4 , S5 , 
with the integrand ^</> replaced with £^ *ij^<l> {0) • The up- 

per bounds of Si for % / 4, will be same. We illustrate this with S\ ; for the others 
the same arguments will hold. 

1 r r M ™ 1 / a 



n)dF(e M JdG n (a)dn 
< / E7r^(e Mjl ^,i r n,n)^(e M JdG„( f 7)dn 



= ^7)E/ L ( @ M n ,z,Y n )dH(0 Mn )dG n (a), 

using the fact that = 1- Hence Si has same order as P(Z G fl*,0M„ G 

E,a < a n \Y n ) for the modified model also. 

To investigate the form of the density where the modified SB model converges 
to, note that 

X£(@M„ n)d^(e M JdG n ((r)czn 



1= 

x ]T L(e M „,^i r „,n)dff(e M „)dG ! n ( < 7)dn. 

(«i) c 



(123) 
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For each i, using GMVT we get 

1 f f 1 Jy-h 



x L{e Mn ,z,Y n ,U)dH(e M JdG n (a)dn 

(Rtr 

1 f y -9* n (y) 



(a* n (y) + ky \a*(y) + k 



x 

D 



l - I I iTi V L(& Mn ,z,Y n ,U)dH(e Mn )dG n (a)da, 



(124) 

where, for every y, Q* n (y) G (—a — c,a + c), and cr* (y) € (0, oy, 
Hence, S4 given by (123) becomes 



(y) + k) \a*(y) + k 



l -f f TL(e Mn ,z,Y n ,U)dH(@ Mn )dG n (a)dU, 
UJnJl HRtY 

1 f y -e*( y y 



(y) + ky \a*(y) + k, 
(125) xP((Rl) c ,I A \Y n ), 

again using the fact that J2i=i = L 

Since it is already shown in Section 7 of MB that ( o -*(y) + fc) ( g*"o/")+fc ) 

1^ and P((^) c ,/ 4 |r n ) -> 1, it follows that 5 4 -> \<t> 

With very minor adjustments to the proof of Theorem 7.6, here it can be proved 
that the EW model and the modified SB model converge to the same distribution. 

Hence, it is easy to see that the order of the MISE of the modified SB model 
will remains the same as the previous version of the SB model. 

S-10. Prior predictive convergence rates. 

S - 1 0. 1 . Models under consideration. 

S-10. 1.1. Assumptions on the true distribution. Let /o denote the true distri- 
bution. We assume that /o is continuously differentiable of all orders. Thus, the 
assumptions on /o for the prior calculations are different from those associated 
with the posterior calculations. 
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S-10. 1.2. EW model. We denote n = {#1, . . . , 9 n } and assume 9i ~ F and 
F ~ D(aG'o), where D(aGo) denotes the Dirichlet process with central measure 
Go and hyperparameter a. Then, according to the EW model, the prior predictive 
density at the point y has the following form: 

(126) f EW (y | n , h) = -^—A + ^— \K ( V^i) , 

a + n a + n ^-^ h \ h I 

where we assume the kernel K(-) to be symmetric around zero, A = J g j^K {^jf^j Go(6)d6. 
In our work on calculating prior rates, we determine the optimum value of h by 
minimizing the rate of convergence. 

S-10. 1.3. SB model. The SB model assumes that the density at the point y is 
given by, 

(127) /^l^^igH^). 

where for each I, 9i ~ F; F ~ D(aG ), @M n = {Oi, ■ ■ ■ , Mn }\ M n being the 
maximum number of distinct components the mixture model can have. In both EW 
and SB models, we confine attention to the case where K (•) is chosen to be the 
N(0,1) density. 

S-10.2. Measure of divergence. To study the rates of convergence we need to 
define a measure of divergence. In our study we use the MISE as the measure of 
divergence, defined as, 

(128) MISE = [ E(f(y | 9) - f{y)fdy 

Jy 

= jJ & {f{y\Q)- f{y)} 2 dH{®)dy, 

where the expectation is taken over 0, the set of parameters of the model and 
H{-) is the joint distribution of obtained by integrating out the unknown random 
measure F. Since F is a Dirichlet process in our case, H(-) is composed of the 
Polya urn distributions. As mentioned before, h will be determined by minimizing 
MISE. 

We have assumed h to be non-random and determine it by minimizing M ISE 
given in (128). We begin with the assumption that Go(-) = /(•)■ This assumption 
is more pedagogical than practical but we will show subsequently that the prior 
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convergence rate remains unchanged under less restrictive assumptions on Go. Of 
course, it is most appropriate to study the posterior convergence rate since the in- 
formation contained in the data ensures that the resultant calculations are valid 
under far greater generality; in fact, restrictive assumptions on Go are not neces- 
sary at all. However, since in the Bayesian paradigm it is often recommended that 
the prior parameters be chosen before observing the data, it perhaps makes sense 
to choose values of the prior parameters which keep the prior MISE rates at the 
desired level. 

Since EW is a special case of SB, we begin by calculating the MISE for the 
SB model; that for EW will follow quite simply then. 

S-10.2. 1. Calculations. 



where bias h (y) = \ E[f S B (y I @M n ,h)) - f(y) \ and H(Q Mn ) is the joint 



MISE 





(129) 




distribution of m„ ■ We write, 




G (x)dx - f(y) 



Calculations similar to those in Silverman (1986) yield 



(130) 




Now, the variance term can be expressed as 



Var(f SB (y\e Mn ,h)) 
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Now, assuming that h is small and M n is large, and expanding f(y—th) as a Taylor 
series we obtain 

('3D /^E^H^))^°Gi 



For the covariance term, 
1 



3 ¥j 

M n (M n -l)^_{ T ^ fy-6 1 \ R (y-e 2 



Mlh 2 




-±) 


1 


M n J 




Cov (k 


(»- 



h r \ h 



- h 2 V V h J ' V h 
(132) 

Simple calculations, using Taylor's series expansion yields 
From (130), (131) and (133) we conclude that 



(134) MISE(SB) = O ( — L + ^— + h 4 

y ' \M n h a+1 

We will assume M n and a to be functions of n and will determine /i as a function 
of n by minimizing the above form of MISE. The resultant form of MISE will 
be compared with that corresponding to the EW model. 

S-10.3. Determination of h. Minimizing (134) with respect to h yields h op t = 
O [Mn^ ■ Plugging h opt in (134) gives the optimal order of MISE(SB): 

(135) MISE(SB) opt = O + M~ ±5 

\a + 1 

Choosing a = O (Af*) we get MISE(SB) opt = O (M~ A ) if A < | and 
o(m~*) if A> §. 



imsart-aos ver. 2012/04/10 file: mise.tex date: March 4, 2013 



MISE BASED CONVERGENCE RATES OF MIXTURES 



71 



S-10.4. Extension to a more general situation. In the prior-based MISE cal- 
culated so far we have assumed that Gq(-) = /(•). Let us now assume a more 

general situation where Gq(0) = jj-g(9) + ^1 — jj-^j /(#)> where g(-) is a den- 
sity different from the true density /(•). It can be easily shown that here also the 
order of MISE(SB) and MISE(SB) opt remain of the form as (134) and (135) 
respectively. 

S-10.5. Prior convergence rates of the EW model. The form of the predictive 
distribution of EW is 

f E w(y I e n , h) = -?-a + ^-Y\k ( y -^) . 

a + n a + n^-^h \ n I 

i=i v 7 

Note that the conditional distribution of 9i given 6_j n (= G n \ Oi) is given by 
5^0(00 + h (0i). Suppose a = (n w ). If we choose w > I, 

then — )> 1. This will imply that the conditional distribution 0, given ©_j„ is 
close to Go{9i) for large n. This fact will defeat the basic goal of using nonpara- 
metric prior. Thus we will set uj < 1 and choose a such that — > 1. Thus for 

large n the model will look like Y17=l \^ (j~fi) > tne f° rm similar to our 
form in (127). Calculations similar to those associated with the MISE of the SB 
model show that for the EW model the MISE is given by 



(136) MISE(EW) = O ( 1 + —J— + //) 

\ (a + n)/i a + 1 / 

Minimizing this form of MISE yields h op t which is of the order ^ + ^ 1/5 . 

Plugging hopt in MISE(EW) gives MISE^EW)^ = O (j-^ + 

Note that the order of MISE(EW) is less than that of MISE(SB) unless 
M n > n. This is to be expected since so far our calculations are with respect to the 
prior only, not involving the data; hence, more the number of components in the 
mixture model faster the rate of convergence. 

S-ll. Comparison between prior and posterior convergence rates. To pro- 
vide a feel for the prior and the posterior rates associated with the two compet- 
ing models of EW and SB, we present plots of the orders of MISE(EW) and 
MISE(SB) for both prior and posterior predictive cases, for appropriate values 
of the parameters involved in the rates of convergence. 

We assume the following forms: M„ = n b , a = n u , a"i = -^—r, e* = \, 

o n > n 4e n n 

where r = 3, and t > (the M ISE's will be compared for different values of t). 
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By equating e* = \ , we can calculate the value of which is used get closed 
form expression for e* M . In the above forms we assume that b = 0.2, fiQ = 2.0, and 
(To = 1.0. For different values of t and u>, figures S-l and S-2 present and compare 
the prior based rates of MISE(EW) and MISE(EW) for different values of t 
and r. On the other hand, figures S-3 and S-4 compare the MISE rates of EW and 
SB based on the posterior. 

As already discussed in Section S-10.5, figures S-l and S-2 show that the prior 
EW rates are faster than the prior SB rates, which is expected, since, in this illus- 
tration, M n < n. On the other hand, the figures S-3 and S-4 show that the posterior 
SB rates are far superior to the posterior EW rates. 

Supplement to "Bayesian MISE convergence rates of mixture models based 
on the Polya urn model: asymptotic comparisons and choice of prior parame- 
ters" 

(). Section S-2 contains proofs of results associated with Section 6, Section S-3 
contains proofs of the results presented in Section 7, the proofs of the results pro- 
vided in Section 8 are given in Section S-4, Section S-5 contains proofs of the 
results associated with Section 9, the proofs of the results presented in Section 12 
are provided in Section S-6, Section S-7 contains a method of splitting the integrals 
needed for proofs of the results associated with the "large p, small n problem" in 
the case of the SB model. Proofs of the results stated in Section 13 are provided in 
Section S-8, Section S-9 contains the proofs of the results corresponding to Section 
14, Section S-10 contains results and proofs associated with the Bayesian MISE- 
based prior predictive convergence rates, and finally, comparison between prior and 
posterior Bayesian M ISE-based convergence rates is provided in Section S-ll. 
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Fig S-l. Prior predictive case with uj = 0.05. The black curve corresponds to the prior predictive 
MISE of the EW model and the red curve corresponds to that of the SB model. The left and the 
right panel correspond tot = 2 and t — 5, respectively. 




FIG S-2. Prior predictive case with to = 0.1. The black curve corresponds to the prior predictive 
MISE of the EW model and the red curve corresponds to that of the SB model. The left and the 
right panel correspond to t = 2 and t — 5, respectively. 
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FIG S-3. Posterior predictive case with uj = 0.05. The black curve corresponds to the prior predic- 
tive MISE of the EW model and the red curve corresponds to that of the SB model. The left and the 
right panel correspond tot = 2 and t — 5, respectively. 




20000 



FIG S-4. Posterior predictive case with u = 0.1. The black curve corresponds to the prior predictive 
MISE of the EW model and the red curve corresponds to that of the SB model. The left and the 
right panel correspond to t = 2 and t — 5, respectively. 
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